================================================================
PATENT POSTURE AND LICENSE-SCOPE NOTICE
(informational; does not modify the Apache-2.0 license in LICENSE)
================================================================

Why Apache-2.0 for tsugi-mend (NOTICE PREAMBLE)

This SDK (tsugi-mend, the "Software") is licensed under the
Apache License, Version 2.0 ("Apache-2.0"), with its full automatic
patent grant. See the LICENSE file for the full Apache-2.0 license text.

TsugiCinema, Inc. ("Licensor") publishes a separate, patent-aligned
SDK at github.com/tsugiai/tsugi-kpool ("tsugi-kpool"),
which is also licensed under the Apache License, Version 2.0. That
SDK is the implementation of two TsugiCinema US provisional patents
(US App. 64/060,315 K-Pool LoRA and US App. 64/055,093 Infinity) at
LoRA-adapter granularity. The Apache-2.0 patent grant in Section 3 on
that SDK extends to those patent estates as practiced by the SDK code
as distributed, bounded by Section 3's "necessarily infringed by their
Contribution" language.

THIS SDK (tsugi-mend) IS A DELIBERATELY SEPARATE WORK. It does
not exercise either of TsugiCinema's two patent estates listed
above. It implements public-art reducer and orchestration components,
plus additional public-art helper components:

  - Decoupled DiLoCo for Resilient Distributed Pre-training
    (Arthur Douillard et al., arXiv:2604.21428, April 2026)
  - DES-LOC / Local Adam (Iacob et al., arXiv:2505.22549, May 2025)
  - Async Tensor Parallelism (PyTorch / TorchTitan, September 2024)
  - FALCON fail-slow detection (arXiv:2410.12588, October 2024)

Because this SDK does not exercise TsugiCinema's K-Pool LoRA or
Infinity patents, the Apache-2.0 automatic patent grant on this SDK
does not extend a license to either of those patent estates. Users who wish to use the patent-aligned mechanisms
covered by US App. 64/060,315 or US App. 64/055,093 should engage
with TsugiCinema separately; those mechanisms are NOT present in this
SDK and the Apache-2.0 grant on this SDK does not reach them.

The two SDKs share zero code. This SDK was scoped on 2026-05-21 to
maximize measured throughput uplift on cross-rack distributed
training using public-art techniques only, so that the Apache-2.0
patent grant could be granted without scoping and without leaking
the patent moat carried by tsugi-kpool.

================================================================

tsugi-mend
Copyright 2026 TsugiCinema, Inc.

This product includes software developed by TsugiCinema, Inc. and is
licensed under the Apache License, Version 2.0 (see LICENSE).

This software composes public prior art from the following sources.
Each citation is to a published reference relevant to code components
in this repository. Items 1 and 6 are the reducer/orchestrator paths
exercised by the current benchmark harness. Items 2 through 5 are
implemented as standalone components, validators, or integration points
in 0.1.x and are not all automatically invoked by `mend_init`. The
implementation here is original TsugiCinema, Inc. code.

1. Cross-rack reducer (GraceWindowSyncer state machine).
   Decoupled DiLoCo for Resilient Distributed Pre-training.
   Arthur Douillard; Keith Rush; Yani Donchev; Zachary Charles;
   Nova Fallen; Ayush Dubey; Ionel Gog; Josef Dean;
   Blake Woodworth; Zachary Garrett; Nate Keating; Jenny Bishop;
   Henry Prior; Edouard Yvinec; Arthur Szlam;
   Marc'Aurelio Ranzato; Jeff Dean (2026).
   arXiv:2604.21428.

2. Desynchronized optimizer momenta component (DES-LOC).
   DES-LOC: Desynced Low Communication Adaptive Optimizers for
   Training Foundation Models.
   Iacob, A. et al. (2025). arXiv:2505.22549.

3. Async tensor parallelism integration component.
   TorchTitan: PyTorch Pre-training Native Library.
   Wanchao Liang, Tianyu Liu, Less Wright, Will Constable,
   Andrew Gu, Chien-Chin Huang, Iris Zhang, Wei Feng,
   Howard Huang, Junjie Wang, Sanket Purandare, Gokul Nadathur,
   Stratos Idreos. PyTorch (2024). https://github.com/pytorch/torchtitan

4. Fail-slow detection component (FALCON sliding-window z-score).
   FALCON: Pinpointing and Mitigating Stragglers for Large-Scale
   Hybrid-Parallel Training.
   Wu, T. et al. (2024). arXiv:2410.12588.

5. Optional gradient compression primitive (PowerSGD with error feedback).
   PowerSGD: Practical Low-Rank Gradient Compression for Distributed
   Optimization.
   Vogels, T.; Karimireddy, S. P.; Jaggi, M. (2019).
   NeurIPS 2019. arXiv:1905.13727.
   The PowerSGD primitive in `src/tsugi_mend/compression.py` is a
   from-scratch reproduction of the rank-r power-iteration algorithm
   with persistent error feedback. PyTorch's native
   `torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook` was
   considered but is DDP-bucket-bound and not directly reusable for
   the GraceWindowSyncer fragment-merge path.

6. Concurrent outer-step orchestrator.
   Original TsugiCinema, Inc. work. The orchestrator wraps the
   Decoupled DiLoCo control law (item 1) in an asyncio-task-based
   pattern that overlaps the cross-rack grace window with inner-step
   forward/backward compute. Convergence-equivalent to Decoupled
   DiLoCo by Algorithm 2 staggering analysis; see
   `docs/convergence_equivalence_sketch.md` for the proof sketch.

Patent posture:
The Apache-2.0 license grants its automatic patent grant under Section
3 for patent claims necessarily infringed by Contributions in this
repository. The mend SDK is patent-independent by deliberate
construction; it does NOT exercise the K-Pool LoRA (US App.
64/060,315) or Infinity (US App. 64/055,093) patent estates owned by
TsugiCinema, Inc. Those patent estates are implemented in the
companion patent-aligned SDK at github.com/tsugiai/tsugi-kpool and are
not present in this repository.
