Metadata-Version: 2.4
Name: alta-models-sft
Version: 1.0.0
Summary: ALTAModel SFT — instruction-tuned Kinyarwanda language models from YaliLabs.
Project-URL: Homepage, https://github.com/yalilabs/alta-models-sft
Project-URL: Repository, https://github.com/yalilabs/alta-models-sft
Project-URL: Issues, https://github.com/yalilabs/alta-models-sft/issues
Project-URL: Model Hub, https://huggingface.co/yalilabs
Author-email: YaliLabs <info@yalilabs.com>
License-File: LICENSE
Keywords: ALTA,Rwanda,african-nlp,instruction-tuning,kinyarwanda,language-model,llm,sft
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: huggingface-hub>=0.23
Requires-Dist: safetensors>=0.4
Requires-Dist: torch>=2.2
Requires-Dist: transformers>=4.40
Provides-Extra: all
Requires-Dist: build>=1.2; extra == 'all'
Requires-Dist: datasets>=2.18; extra == 'all'
Requires-Dist: fastapi>=0.110; extra == 'all'
Requires-Dist: psutil>=5.9; extra == 'all'
Requires-Dist: pydantic>=2.6; extra == 'all'
Requires-Dist: pynvml>=11.5; extra == 'all'
Requires-Dist: pytest-cov>=5; extra == 'all'
Requires-Dist: pytest>=8; extra == 'all'
Requires-Dist: ruff>=0.4; extra == 'all'
Requires-Dist: tensorboard>=2.15; extra == 'all'
Requires-Dist: tqdm>=4.66; extra == 'all'
Requires-Dist: twine>=5; extra == 'all'
Requires-Dist: uvicorn[standard]>=0.29; extra == 'all'
Provides-Extra: dev
Requires-Dist: build>=1.2; extra == 'dev'
Requires-Dist: pytest-cov>=5; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Requires-Dist: twine>=5; extra == 'dev'
Provides-Extra: serve
Requires-Dist: fastapi>=0.110; extra == 'serve'
Requires-Dist: pydantic>=2.6; extra == 'serve'
Requires-Dist: uvicorn[standard]>=0.29; extra == 'serve'
Provides-Extra: train
Requires-Dist: datasets>=2.18; extra == 'train'
Requires-Dist: psutil>=5.9; extra == 'train'
Requires-Dist: pynvml>=11.5; extra == 'train'
Requires-Dist: tensorboard>=2.15; extra == 'train'
Requires-Dist: tqdm>=4.66; extra == 'train'
Description-Content-Type: text/markdown

# alta-models-sft (internal)

Monorepo for the ALTA SFT runtime package and its training pipeline. This README is for **internal use** — anyone with repo access. The public-facing PyPI README is `PYPI_README.md` and ships with the wheel.

> **Confidential.** Training scripts, datasets, internal benchmarks, and unpublished checkpoints should never be checked in. See [`.gitignore`](.gitignore) for excluded paths.

---

## Contents

- [What's in this repo](#whats-in-this-repo)
- [First-time setup](#first-time-setup)
- [Day-to-day workflows](#day-to-day-workflows)
  - [Training a new SFT model](#training-a-new-sft-model)
  - [Testing a trained model](#testing-a-trained-model)
  - [Exporting for distribution](#exporting-for-distribution)
  - [Uploading weights to Hugging Face](#uploading-weights-to-hugging-face)
  - [Cutting a runtime package release](#cutting-a-runtime-package-release)
- [Architecture: how training and the package share code](#architecture-how-training-and-the-package-share-code)
- [Versioning policy](#versioning-policy)
- [Operations](#operations)

---

## What's in this repo

```
alta-models-sft/
├── src/alta_models_sft/          ← Runtime package (the only thing shipped to PyPI)
│   ├── modeling/                 ← Model architecture (RoPE, GQA, SwiGLU, blocks)
│   ├── inference/                ← ALTAChat, ChatML, sampling, masking
│   ├── hub.py                    ← Local + Hub model resolution
│   ├── cli.py                    ← `alta-sft` CLI
│   └── server.py                 ← FastAPI server (extra dep)
│
├── training/                     ← Training pipeline (stays in repo)
│   ├── train.py                  ← Main training entry point
│   ├── config.py                 ← All hyperparameters
│   ├── dataset.py                ← SFT dataset + ChatML masking + collator
│   ├── builder.py                ← Wraps ALTAModel for training
│   ├── checkpoint.py             ← TopK manager, save/load
│   ├── distributed.py            ← DDP setup
│   ├── deduplicate.py            ← MinHash + LSH dedup
│   ├── build_multiturn.py        ← Multi-turn synthesis from single-turn data
│   ├── resource_monitor.py       ← GPU/CPU/RAM telemetry
│   └── ...
│
├── scripts/                      ← Operational tools (never shipped)
│   ├── test_inference.py         ← 8-subcommand model tester
│   ├── export_for_release.py     ← Training checkpoint → release directory
│   └── upload_to_hub.sh          ← Safe Hub upload with validation
│
├── tests/                        ← pytest suite
├── .github/workflows/            ← CI: tests + PyPI release
├── pyproject.toml                ← Package metadata (controls what ships)
├── README.md                     ← THIS file (internal)
└── PYPI_README.md                ← Public README (gets bundled into the wheel)
```

**Important:** The wheel only includes `src/alta_models_sft/`. The `[tool.hatch.build.targets.wheel]` section in `pyproject.toml` enforces this, and CI fails if `training/`, `scripts/`, or `tests/` leak in.

---

## First-time setup

```bash
git clone git@github.com:yalilabs/alta-models-sft.git
cd alta-models-sft

python -m venv .venv
source .venv/bin/activate
pip install -e ".[all]"
```

Verify everything works:

```bash
pytest                                  # all tests should pass
ruff check src tests                    # lint should be clean
alta-sft --version                      # CLI installed
python -m training.train --help         # training importable
```

You also need:

```bash
huggingface-cli login                   # for Hub uploads
# Optional: set HF_TOKEN in your shell for non-interactive use
```

---

## Day-to-day workflows

### Training a new SFT model

#### 1. Prepare data

Training data goes in `./data/` (gitignored). Supported per-sample formats — any mix works in one JSONL:

```jsonl
{"question": "...", "answer": "..."}
{"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
{"instruction": "...", "input": "...", "output": "..."}
{"document": "...", "summary": "..."}
```

Recommended preprocessing pipeline:

```bash
# 1. Deduplicate (writes <output>.jsonl + .report.txt + .duplicates.jsonl + .stats.json)
python -m training.deduplicate \
    --input ./data/raw.jsonl \
    --output ./data/clean \
    --threshold 0.85

# 2. Synthesize multi-turn samples (helps with conversational coherence)
python -m training.build_multiturn \
    --input ./data/clean.jsonl \
    --output ./data/training.jsonl \
    --multiturn_ratio 0.3 \
    --max_chain_length 3

# 3. Hold out a validation split (any way you like)
shuf ./data/training.jsonl | head -1000 > ./data/testing.jsonl
shuf ./data/training.jsonl | tail -n +1001 > ./data/training_split.jsonl
```

#### 2. Run training

Single GPU:

```bash
python -m training.train \
    --pretrained_dir ./pretrained/alta_base \
    --train_data ./data/training_split.jsonl \
    --val_data ./data/testing.jsonl \
    --output_dir ./sft_output
```

Multi-GPU (DDP via `torchrun`):

```bash
torchrun --nproc_per_node=4 -m training.train \
    --pretrained_dir ./pretrained/alta_base \
    --train_data ./data/training_split.jsonl \
    --val_data ./data/testing.jsonl \
    --output_dir ./sft_output
```

Resume from a previous checkpoint:

```bash
python -m training.train \
    --resume ./sft_output/checkpoints/alta_epoch003_step1500_loss1.8234.pt \
    --train_data ./data/training_split.jsonl \
    --val_data ./data/testing.jsonl \
    --output_dir ./sft_output
```

Hyperparameters live in [`training/config.py`](training/config.py). Common ones can be overridden via CLI:

```bash
python -m training.train ... \
    --epochs 5 \
    --batch_size 16 \
    --target_lr 1e-5 \
    --max_seq_len 2048 \
    --grad_accum_steps 4
```

#### 3. Monitor

```bash
tensorboard --logdir ./sft_output/tensorboard
tail -f ./sft_output/logs/train_rank0.log
```

Watch for: `val_loss` decreasing each epoch, `train_loss` not diverging from `val_loss`, sample generations becoming coherent. The expected `val_loss` range after epoch 1 is in `config.py` (`expected_val_loss_at_epoch_1`).

#### 4. When training finishes

`train.py` automatically calls `save_pretrained()` on the best model. The output is at `./sft_output/alta_sft_final/`:

```
sft_output/alta_sft_final/
├── config.json                   # includes model_format_version
└── model.safetensors             # safetensors format, ready to distribute
```

This directory is **already in the distribution format** — you can load it immediately:

```bash
alta-sft chat --model ./sft_output/alta_sft_final
```

### Testing a trained model

`scripts/test_inference.py` has 8 subcommands. Run from repo root.

```bash
# 1. Quick smoke test (3 prompts, <1 min) — ALWAYS run this first
python scripts/test_inference.py smoke --model ./sft_output/alta_sft_final

# 2. Full prompt suite (writes JSON report)
python scripts/test_inference.py suite \
    --model ./sft_output/alta_sft_final \
    --output ./results/run_$(date +%Y%m%d_%H%M).json

# 3. Interactive REPL for qualitative exploration
python scripts/test_inference.py chat \
    --model ./sft_output/alta_sft_final --stream

# 4. Single prompt
python scripts/test_inference.py single \
    --model ./sft_output/alta_sft_final \
    --prompt "Sobanura amateka y'u Rwanda" --stream

# 5. Multi-turn conversation test (catches memory bugs)
python scripts/test_inference.py multiturn --model ./sft_output/alta_sft_final

# 6. Sampling comparison (same prompt, different configs)
python scripts/test_inference.py compare \
    --model ./sft_output/alta_sft_final \
    --prompt "Mwiriwe!" \
    --configs '[{"temperature":0.3},{"temperature":0.8,"top_p":0.95}]'

# 7. Mask ablation (loads model twice — with/without non-Kinyarwanda mask)
python scripts/test_inference.py mask_ablation \
    --model ./sft_output/alta_sft_final --prompt "Bite?"

# 8. Throughput benchmark
python scripts/test_inference.py bench \
    --model ./sft_output/alta_sft_final --num_prompts 20 --device cuda --dtype bfloat16
```

**Promotion criteria** before releasing a checkpoint publicly:

- [ ] `smoke` passes (no crashes, non-empty responses)
- [ ] `suite` has zero crashes; spot-check at least 3 categories of responses look reasonable
- [ ] `multiturn` shows the model uses prior context (doesn't repeat introductions)
- [ ] `mask_ablation` shows the model produces clean Kinyarwanda even **without** the mask (a real fluency check)
- [ ] `bench` throughput is within expected range for the target hardware

### Exporting for distribution

`train.py` already saves in the distribution format, so this step is only needed if you want to:

- Bundle a tokenizer into the directory
- Tag the export with a release version string
- Convert an old `.pt` checkpoint to safetensors

```bash
python scripts/export_for_release.py \
    --checkpoint ./sft_output/alta_sft_final \
    --output ./release/alta-base-sft-v1.0 \
    --version v1.0 \
    --include_tokenizer \
    --tokenizer yalilabs/alta-tokenizer
```

Output:

```
release/alta-base-sft-v1.0/
├── config.json                   # with release_version + release_date metadata
├── model.safetensors
├── tokenizer.json                # bundled
├── special_tokens_map.json
├── tokenizer_config.json
└── README.md                     # auto-generated model card
```

### Uploading weights to Hugging Face

Use `upload_to_hub.sh` — it validates everything (auth, repo existence, load test, tag collision) before uploading.

```bash
# Standard release
./scripts/upload_to_hub.sh \
    --model_dir ./release/alta-base-sft-v1.0 \
    --repo yalilabs/alta-base-sft \
    --version v1.0

# First-time release of a new model (creates repo if missing)
./scripts/upload_to_hub.sh \
    --model_dir ./release/alta-base-sft-v0.9 \
    --repo yalilabs/alta-base-sft \
    --version v0.9 \
    --private --create_repo

# CI-friendly (no prompts)
./scripts/upload_to_hub.sh \
    --model_dir ./release/alta-base-sft-v1.0 \
    --repo yalilabs/alta-base-sft \
    --version v1.0 --yes

# Dry-run to validate without uploading
./scripts/upload_to_hub.sh \
    --model_dir ./release/alta-base-sft-v1.0 \
    --repo yalilabs/alta-base-sft \
    --version v1.0 --dry_run
```

After upload, **always verify** by clearing the cache and loading fresh:

```bash
rm -rf ~/.cache/huggingface/hub/models--yalilabs--alta-base-sft
alta-sft chat --model yalilabs/alta-base-sft --revision v1.0
```

### Cutting a runtime package release

The package on PyPI versions independently of model weights. Bump the package version only when the **runtime code** changes — not when only weights change.

When to bump:

| Change | Bump |
|---|---|
| Bug fix in inference / CLI / server | patch (`0.1.0` → `0.1.1`) |
| New CLI flag, new optional arg, new public function | minor (`0.1.0` → `0.2.0`) |
| Removed function, renamed class, changed default behavior | major (`0.1.0` → `1.0.0`) |
| Breaking change to `config.json` schema | bump `MODEL_FORMAT_MAX` in `_version.py` AND major bump |

Steps:

1. Update `src/alta_models_sft/_version.py`:
   ```python
   __version__ = "0.2.0"
   ```
2. Update `CHANGELOG.md` (top of file):
   ```markdown
   ## [0.2.0] - 2026-06-15
   ### Added
   - Stream support for `alta-sft generate`
   ### Fixed
   - KV cache overflow on 4096-token contexts
   ```
3. Commit, tag, push:
   ```bash
   git add . && git commit -m "Release 0.2.0"
   git tag v0.2.0
   git push origin main --tags
   ```
4. **GitHub Actions takes over** — `.github/workflows/release.yml` builds the wheel, verifies training code is excluded, and publishes to PyPI via trusted publishing.

5. Verify on PyPI:
   ```bash
   pip install -U alta-models-sft
   alta-sft --version          # should show 0.2.0
   ```

---

## Architecture: how training and the package share code

The single most important design decision in this repo: **the model architecture is defined exactly once**, in `src/alta_models_sft/modeling/model.py`. Training and inference both import from there.

```
                       ┌──────────────────────────────────────────┐
                       │  src/alta_models_sft/modeling/model.py  │
                       │  ALTAModel — single definition           │
                       └──────────────────────┬───────────────────┘
                                              │
                  ┌───────────────────────────┼───────────────────────────┐
                  │                           │                           │
                  ▼                           ▼                           ▼
       training/train.py        src/alta_models_sft/inference        external users
       (calls init_weights,     (ALTAChat.from_pretrained)            via `pip install`
        gradient ckpt,           — no init, no training paths
        chunked CE loss)
```

The model class has both training capabilities (chunked CE loss, weight init, gradient checkpointing toggles) and inference paths (KV-cached generation, safetensors loading). Inference users never invoke the training methods — they're just there, unused.

Why this matters: there's zero possibility of architecture drift between training-time and inference-time code. The shape of every tensor, the order of operations, the special tokens — all guaranteed identical.

**Don't** add a `training_model.py` that re-implements parts of the architecture. **Don't** copy modeling code into `training/`. If training needs something the model doesn't have, add it to the model class with a flag and document why.

---

## Versioning policy

Two version numbers, kept independent:

1. **Package version** (`src/alta_models_sft/_version.py` → `__version__`)
   - Versions the inference runtime, CLI, server.
   - Follows [SemVer](https://semver.org/).
   - Released to PyPI.

2. **Model revision** (Hugging Face tags: `v1.0`, `v1.1`, `v2.0-instruct`, etc.)
   - Versions the actual weights.
   - Released to Hugging Face Hub.

The runtime checks the model's `model_format_version` against its supported range (`MODEL_FORMAT_MIN..MODEL_FORMAT_MAX`). If incompatible, loading fails with a clear error pointing at the fix.

**Rule of thumb:** users in production should pin both:

```bash
pip install "alta-models-sft==0.1.0"
```

```python
ALTAChat.from_pretrained("yalilabs/alta-base-sft", revision="v1.0")
```

---

## Operations

### Running CI locally before pushing

```bash
ruff check src tests
pytest --cov=alta_models_sft

# Build the wheel and verify training code is NOT included
python -m build
python -m zipfile -l dist/*.whl | grep -E "^(training/|scripts/|tests/)"
# ↑ Should print nothing. If anything prints, fix pyproject.toml.
```

### Common gotchas

- **DDP runs need `torchrun`.** Plain `python -m training.train` only uses one GPU even on multi-GPU machines.
- **Tokenizer/model vocab mismatch.** If you change the tokenizer, you must re-pretrain — SFT can't recover from a vocab mismatch.
- **`max_seq_len` truncation drops assistant turns.** Long multi-turn samples that exceed `max_seq_len` get truncated from the right, which may remove the supervised target. The dataset logs this; check the filter breakdown.
- **PyPI is forever.** Never re-publish the same version number with different content. If 0.2.0 has a bug, release 0.2.1.
- **HF Hub tags should also be immutable** in practice. Don't re-tag `v1.0` — release `v1.0.1`.

### Where to look when things break

| Symptom | First place to check |
|---|---|
| Training crashes immediately | `./sft_output/logs/train_rank0.log` — usually a data-format issue |
| Training loss stuck high | Tokenizer/vocab mismatch; or `mask_user_tokens` config is wrong |
| Sample generations are garbage | Try `mask_ablation` test; verify ChatML format matches training |
| PyPI upload fails | Check `_version.py` matches the git tag; check trusted publishing config |
| HF upload fails auth | `huggingface-cli whoami` — token may have expired |
| Model loads on Hub but not locally | Run `python -c "import alta_models_sft; print(alta_models_sft.__version__)"` to verify install |

### Contacts

- Training questions: `#alta-training` Slack channel
- Infra / Hub uploads: `#ml-platform`
- Public releases: tag `@releases` in `#alta-models`

---

## License

The runtime package is Apache 2.0 (see [LICENSE](LICENSE)). Training data, internal benchmarks, and unpublished checkpoints are **internal only** and must not be checked into this repo.