Metadata-Version: 2.4
Name: eulerforge
Version: 0.1.0
Summary: Unified MoE+LoRA experimentation kit
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.1
Requires-Dist: transformers>=5.5
Requires-Dist: safetensors>=0.4.2
Requires-Dist: datasets>=2.16
Requires-Dist: pyyaml>=6.0.1
Requires-Dist: accelerate>=0.27.0
Requires-Dist: bitsandbytes>=0.46.1
Requires-Dist: peft>=0.10
Requires-Dist: tqdm
Requires-Dist: optuna>=3.0
Provides-Extra: tb
Requires-Dist: tensorboard>=2.14; extra == "tb"
Dynamic: license-file

# EulerForge

> 🇰🇷 한국어: [README.ko.md](README.ko.md)

**EulerForge** is a unified fine-tuning toolkit for HuggingFace-compatible LLMs.
It brings together LoRA, Mixture-of-LoRAs, MoE-expert LoRA, and native-MoE fine-tuning
under a single YAML-driven CLI, with built-in support for SFT, DPO, ORPO, RM, and PPO.

One preset, one command — from a base model to a deployable checkpoint.

---

## Features

- **Five training paths** in one pipeline: SFT → DPO / ORPO → RM → PPO
- **Four injection strategies**: `dense_lora`, `mixture_lora`, `moe_expert_lora`, `native_moe_expert_lora`
- **Broad backbone support**: Qwen2/3, Llama 2/3, Gemma 3, Gemma 4 (dense + MoE), Mixtral
- **4-bit / 8-bit quantized training** (nf4 / int4 / int8) via bitsandbytes
- **Pipeline-friendly**: checkpoints from one stage flow into the next (`SFT → DPO` auto-detects base model and LoRA config)
- **Phase scheduling**: progressively unfreeze router → LoRA → base FFN
- **Preflight + MoE stability validation**: catch config errors before a single GPU cycle
- **Integrated benchmarking** with Ollama / OpenAI / local HF judge models
- **Hyperparameter search** (`grid` / `random` / `bayes`) via Optuna
- **HF export**: produce a standard HuggingFace directory that can be loaded with `from_pretrained()`
- **Internationalized CLI**: `--lang ko/en/zh/ja/es`

---

## Installation

```bash
pip install -e .
```

**Requirements**: Python ≥ 3.9, PyTorch ≥ 2.1, Transformers ≥ 5.5.

Optional extras:

```bash
pip install -e .[hpo]  # Optuna for grid/bayesian search
pip install -e .[tb]   # TensorBoard logging
```

---

## Quickstart

### 1. Train

```bash
# Simplest: dense LoRA SFT on Qwen3.5-0.8B with raw JSONL data
eulerforge train \
    --preset configs/presets/qwen3.5_0.8b_dense_lora_sft.yml \
    --set data.format=raw \
    --set data.task=sft \
    --set data.path=data/sft_10k_raw.jsonl \
    --set data.max_length=512
```

### 2. Evaluate

```bash
# Benchmark against a baseline + judge model
eulerforge bench --preset configs/bench/sft_with_judge.yml \
    --target-output-dir outputs/run_YYYYMMDD_HHMMSS
```

### 3. Export

```bash
# Merge LoRA into a standard HF directory
eulerforge export-hf \
    --checkpoint outputs/run_YYYYMMDD_HHMMSS \
    --output ./exported
```

### 4. Load in Python

```python
from eulerforge import load_model

result = load_model("outputs/run_YYYYMMDD_HHMMSS")
# result.model, result.tokenizer, result.metadata
```

---

## CLI Commands

| Command | What it does |
|---------|------|
| `eulerforge train` | Fine-tune with LoRA / MoE strategies + phase scheduling |
| `eulerforge convert` | Convert arbitrary JSONL → EulerForge raw JSONL (map / recipe) |
| `eulerforge preprocess` | Tokenize raw JSONL → processed JSONL (cached) |
| `eulerforge bench` | Run target / baseline / judge inference benchmark |
| `eulerforge export-hf` | Export trained checkpoint as a standard HF model directory |
| `eulerforge grid` | Hyperparameter search (grid / random / bayesian) |

Global options apply to all commands:

```bash
eulerforge --lang en train --preset ...   # English log output
eulerforge --lang ko train --preset ...   # Korean (default)
```

> 📖 Full CLI reference with every flag, YAML spec, training types, phase scheduling, load precision, and output structure: **[docs/cli_en.md](docs/cli_en.md)**

---

## Included Presets

A curated set of ready-to-run training presets lives under `configs/presets/`:

| Category | Example preset | Strategy | Training |
|----------|--------|----------|----------|
| Qwen 3.5 small | `qwen3.5_0.8b_dense_lora_sft.yml` | Dense LoRA | SFT |
| Qwen 3.5 small | `qwen3.5_0.8b_mixture_lora_sft.yml` | Mixture-of-LoRAs | SFT |
| Qwen 3.5 small | `qwen3.5_0.8b_moe_expert_lora_sft.yml` | MoE Expert LoRA | SFT |
| Qwen 3.5 small | `qwen3.5_0.8b_moe_expert_lora_dpo.yml` | MoE Expert LoRA | DPO |
| Qwen 3.5 small | `qwen3.5_0.8b_dense_lora_orpo.yml` | Dense LoRA | ORPO |
| Qwen 3.5 small | `qwen3.5_0.8b_dense_lora_rm.yml` | Dense LoRA | RM |
| Qwen 3.5 small | `qwen3.5_0.8b_dense_lora_ppo.yml` | Dense LoRA | PPO |
| Qwen 3.5 | `qwen3.5_4b_dense_lora_sft.yml` | Dense LoRA | SFT |
| Llama 3.2 | `llama3_1b_dense_lora_sft.yml` | Dense LoRA | SFT |
| Llama 3.2 | `llama3_1b_moe_expert_lora_sft_handoff.yml` | MoE Expert LoRA + Handoff | SFT |
| Gemma 3 | `gemma3_1b_mixture_lora_dpo.yml` | Mixture-of-LoRAs | DPO |
| Gemma 3 | `gemma3_4b_moe_expert_lora_orpo_handoff.yml` | MoE Expert LoRA + Handoff | ORPO |
| Gemma 4 (dense) | `gemma4_e2b_dense_lora_sft.yml`, `gemma4_e4b_*.yml` | All strategies | SFT / DPO |
| Gemma 4 (MoE) | `gemma4_26b_a4b_native_expert_lora_sft.yml` | Native MoE Expert LoRA | SFT |
| Mixtral | `mixtral_native_expert_lora_sft.yml` | Native MoE Expert LoRA | SFT |
| TinyLlama | `tinyllama_1.1b_dense_lora_dpo.yml` | Dense LoRA | DPO |

Domain-specific preset groups:

- `configs/presets/math/` — math SFT + DPO pipeline for Llama 3.2-1B
- `configs/presets/reasoning/` — 2-stage Mixture-of-LoRAs for CoT reasoning

Benchmark presets live under `configs/bench/`.

---

## Tutorials

EulerForge ships with a numbered tutorial series, available in both English and Korean.

Browse: [docs/tutorials/en/](docs/tutorials/en/) — [docs/tutorials/ko/](docs/tutorials/ko/)

Recommended reading order:

1. [Getting Started](docs/tutorials/en/getting_started.md) — install, strategy selection, CLI quickstart
2. [00. Data Preprocessing](docs/tutorials/en/00_data_preprocessing.md)
3. [01. Dense LoRA](docs/tutorials/en/01_dense_lora.md)
4. [02. Mixture-of-LoRAs](docs/tutorials/en/02_mixture_lora.md)
5. [03. MoE Expert LoRA](docs/tutorials/en/03_moe_expert_lora.md)
6. [04. Native MoE Expert LoRA](docs/tutorials/en/04_native_moe_expert_lora.md)
7. [05. DPO Training](docs/tutorials/en/05_dpo_training.md)
8. [06. ORPO Training](docs/tutorials/en/06_orpo_training.md)
9. [07. Reward Model Training](docs/tutorials/en/07_rm_training.md)
10. [08. PPO (RLHF) Training](docs/tutorials/en/08_ppo_training.md)
11. [09. MoE Stability & Validation](docs/tutorials/en/09_moe_stability_and_validation.md)
12. [10. Metrics Monitoring](docs/tutorials/en/10_metrics_monitoring.md)
13. [11. Inference Benchmark](docs/tutorials/en/11_bench.md)
14. [12. Grid / Random / Bayesian Search](docs/tutorials/en/12_grid_search.md)
15. [13. LLaMA Fine-Tuning](docs/tutorials/en/13_llama_finetuning.md)
16. [14. LoRA Handoff](docs/tutorials/en/14_lora_handoff.md)
17. [15. Loading Models](docs/tutorials/en/15_loading_models.md)
18. [16. HuggingFace Export](docs/tutorials/en/16_export_hf.md)
19. [17. Scratch Pretraining](docs/tutorials/en/17_pretrain.md)
20. [18. Training Pipeline (SFT → PPO)](docs/tutorials/en/18_training_pipeline.md)
21. [19. Data Collection for Labs](docs/tutorials/en/19_data_collection.md)
22. [20. Lab: Math SFT + DPO Pipeline](docs/tutorials/en/20_lab_math_coding.md)
23. [21. Lab: Chain-of-Thought Reasoning Model](docs/tutorials/en/21_lab_thinking_model.md)
24. [22. Lab: Korean Finance Copilot](docs/tutorials/en/22_lab_korean_finance_copilot.md)
25. [23. Lab: Full MoE Pipeline (SFT → DPO → RM → PPO)](docs/tutorials/en/23_lab_full_pipeline_moe.md)

---

## Architecture at a Glance

```
            YAML preset
                │
                ▼
   ┌──────────────────────────────┐
   │   Config resolve + validate   │  ← preflight + MoE stability checks
   └──────────┬───────────────────┘
              ▼
   ┌──────────────────────────────┐
   │   Base model load (bnb 4/8)  │  ← HuggingFace AutoModel
   └──────────┬───────────────────┘
              ▼
   ┌──────────────────────────────┐
   │   Injection (WHERE / WHAT)   │  ← dense_lora / mixture_lora / MoE expert LoRA
   └──────────┬───────────────────┘
              ▼
   ┌──────────────────────────────┐
   │   Phase scheduler (WHEN)     │  ← router → LoRA → base FFN
   └──────────┬───────────────────┘
              ▼
   ┌──────────────────────────────┐
   │   Training loop              │  ← AdamW + cosine LR + AMP + aux loss
   │    SFT / DPO / ORPO / RM / PPO│
   └──────────┬───────────────────┘
              ▼
   ┌──────────────────────────────┐
   │   Checkpoint + HF export     │  ← final / best / latest
   └──────────────────────────────┘
```

Key design points:

- **Backbone adapters** locate layers, FFN and attention modules for each model family, so the same injection / training code works across Qwen, Llama, Gemma, Mixtral, …
- **Phase scheduling** lets you stage who is trainable over time (router warmup → LoRA → full FFN), making large-model fine-tuning stable and reproducible.
- **Pipeline checkpoints** automatically detect a prior EulerForge run and re-use its base model + LoRA config, so `SFT → DPO → RM → PPO` is one sequence of commands.

---

## Testing

```bash
# Run the public test suite (no GPU required)
pytest tests/ -x
```

This covers unit tests, CLI surface tests, validators, backbone adapters, loss functions, schedulers, i18n, and the plugin system. Test fixtures and synthetic data are included.

---

## Contributing

1. Run the test suite: `pytest tests/ -x`.
2. Open a PR with a clear summary of what changed and why.
3. Please keep CLI output i18n-clean (no hard-coded user-facing strings).

---

## License

Apache License 2.0. See [LICENSE](LICENSE).

---

## Contact

**EulerWa Inc.**
🌐 [eulerwa.com](https://eulerwa.com)
📧 [tech@eulerwa.com](mailto:tech@eulerwa.com)
