Attribution for src/tether/kernels/ vendored sources
=====================================================

This directory contains code lifted from three upstream projects per
the Lift #5 program (reflex_context/01_decisions/2026-05-19-fluxvla-lift-program.md
+ research sidecar features/01_serve/triton-fast-kernels_research.md). The
attribution is intentionally three-source because the upstream FluxVLA repo
itself composes from two prior projects.

Where tether re-uses these sources, the original file-level headers are
preserved verbatim (no comment stripping). Re-vendoring from FluxVLA's main
branch in future updates must also be a manual `cp` to keep these intact —
auto-formatting tools that strip comments will break the attribution.


─────────────────────────────────────────────────────────────────────────────
1. FluxVLA (LimX Dynamics) — Apache-2.0
─────────────────────────────────────────────────────────────────────────────

Source repo:      https://github.com/limx-org/FluxVLA-Engine (private mirror)
Repo version:     tether/reference/FluxVLA/ (vendored copy)
License:          Apache License 2.0
Attribution:      LimX Dynamics — composing layer + Triton kernel orchestration

Files lifted from FluxVLA upstream:
- fluxvla/ops/triton/attention_triton_ops.py
    → src/tether/kernels/triton/attention.py    (360 LOC)
- fluxvla/ops/triton/position_embedding.py
    → src/tether/kernels/triton/position_embedding.py    (330 LOC)
- fluxvla/ops/triton/triton_utils.py
    → src/tether/kernels/triton/utils.py    (27 LOC)
- fluxvla/ops/atomic_ops.py
    → src/tether/kernels/atomic_ops.py    (502 LOC, composing layer)

Apache-2.0 license text reproduced in LICENSE-Apache-2.0.txt (this dir).


─────────────────────────────────────────────────────────────────────────────
2. Dexmal realtime-vla — MIT
─────────────────────────────────────────────────────────────────────────────

Source repo:      https://github.com/dexmal/realtime-vla
Source file:      pi0_infer.py (on main branch, Dec 2024 - Jan 2026 commits)
License:          MIT
Attribution:      Dexmal Research — original Triton kernel authorship

Files lifted from Dexmal upstream via FluxVLA's vendored copy:
- pi0_infer.py (Triton kernel definitions for matmul + RMSNorm)
    → src/tether/kernels/triton/matmul.py    (855 LOC)
    → src/tether/kernels/triton/norm.py    (395 LOC)

Each of these files retains the original Dexmal SPDX header:
    # Origin: Modified from
    # Upstream-URL: https://github.com/dexmal/realtime-vla/blob/main/pi0_infer.py
    # Upstream-Ref: main
    # SPDX-License-Identifier: MIT

MIT license text reproduced in LICENSE-MIT.txt (this dir).


─────────────────────────────────────────────────────────────────────────────
3. LimX CUDA C++ extensions — Apache-2.0
─────────────────────────────────────────────────────────────────────────────

Source repo:      https://github.com/limx-org/FluxVLA-Engine (private mirror)
License:          Apache License 2.0
Attribution:      LimX Dynamics — CUDA C++ kernel extensions

Files lifted from FluxVLA's CUDA C++ extension directory:
- fluxvla/ops/cuda/matmul_bias/
    → src/tether/kernels/cuda/matmul_bias/
      (matmul_bias_forward.cpp + matmul_bias_forward_cuda.cu + Python wrapper)
- fluxvla/ops/cuda/gemma_rotary_embedding/
    → src/tether/kernels/cuda/gemma_rotary_embedding/
- fluxvla/ops/cuda/rotary_pos_embedding/
    → src/tether/kernels/cuda/rotary_pos_embedding/

All under the same Apache-2.0 license as #1; license text shared.


─────────────────────────────────────────────────────────────────────────────
Re-vendoring procedure (for future kernel updates)
─────────────────────────────────────────────────────────────────────────────

When PyTorch / Triton / CUDA toolkit drift breaks the vendored kernels:

1. Verify the failure is real (run full L1 + L2 + L3 parity gates per
   features/01_serve/triton-fast-kernels_plan.md). Don't re-vendor for cosmetic
   reasons — every re-vendor adds drift risk.
2. Update reference/FluxVLA/ (`git submodule update --remote` or manual sync
   from FluxVLA main).
3. Re-run the manual `cp` commands in this file's history (NOT a script
   that strips comments — must preserve the Dexmal SPDX headers).
4. Re-run L1+L2+L3 gates. If still failing within 2 working days,
   disable --fast-kernels in a hotfix release per the kill-trigger ADR
   (01_decisions/2026-05-20-fast-kernels-kill-triggers.md).

Two consecutive failed re-vendor attempts within 60 days → mandatory
kill-decision review per the kill-trigger ADR Trigger 4.
