Metadata-Version: 2.4
Name: freesolo-flash
Version: 0.2.15
Summary: Flash — managed LoRA post-training (SFT/GRPO) for Freesolo environments, driven by the `flash` CLI
Project-URL: Homepage, https://github.com/freesolo-co/flash
Project-URL: Repository, https://github.com/freesolo-co/flash
Author: Freesolo
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: fine-tuning,freesolo,grpo,llm,lora,rl,sft
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: <3.13,>=3.11
Provides-Extra: dev
Requires-Dist: datasets>=2.19; extra == 'dev'
Requires-Dist: fastapi; extra == 'dev'
Requires-Dist: freesolo>=0.2.46; extra == 'dev'
Requires-Dist: httpx>=0.27; extra == 'dev'
Requires-Dist: huggingface-hub>=0.34; extra == 'dev'
Requires-Dist: mypy>=1.13.0; extra == 'dev'
Requires-Dist: pytest>=9.0.3; extra == 'dev'
Requires-Dist: ruff>=0.6; extra == 'dev'
Requires-Dist: runpod-flash; extra == 'dev'
Requires-Dist: uvicorn; extra == 'dev'
Provides-Extra: gpu
Requires-Dist: accelerate>=1.4; extra == 'gpu'
Requires-Dist: bitsandbytes>=0.49; extra == 'gpu'
Requires-Dist: datasets>=2.19; extra == 'gpu'
Requires-Dist: freesolo>=0.2.46; extra == 'gpu'
Requires-Dist: huggingface-hub>=0.34; extra == 'gpu'
Requires-Dist: peft>=0.19; extra == 'gpu'
Requires-Dist: torch==2.10.0; extra == 'gpu'
Requires-Dist: transformers<5.11,>=5.6; extra == 'gpu'
Requires-Dist: trl<1.7,>=1.6; extra == 'gpu'
Requires-Dist: vllm==0.19.1; extra == 'gpu'
Provides-Extra: server
Requires-Dist: datasets>=2.19; extra == 'server'
Requires-Dist: fastapi; extra == 'server'
Requires-Dist: freesolo>=0.2.46; extra == 'server'
Requires-Dist: httpx>=0.27; extra == 'server'
Requires-Dist: huggingface-hub>=0.34; extra == 'server'
Requires-Dist: runpod-flash; extra == 'server'
Requires-Dist: uvicorn; extra == 'server'
Description-Content-Type: text/markdown

# Flash

Managed LoRA post-training service: SFT and GRPO on managed GPUs across multiple
providers — RunPod Flash (serverless queue; RTX 4090/5090 classes) and Vast.ai
(rented verified-datacenter instances; L40S / RTX Pro 4000 / A100 classes). The
allocator picks the cheapest GPU class that fits the run across both providers.

## Scope

- `flash train <cfg.toml>` / control-plane `POST /runs` — submit a training job;
  one dedicated GPU per run, supervised server-side (stall watchdog, bounded
  auto-retry resuming from the last streamed checkpoint, endpoint GC).
- `flash deploy`, `flash chat` — serving for trained adapters.
- **Freesolo SDK environments.** Every run names a Freesolo environment id.
  Scaffold `environment.py`, upload `.` or another folder with
  `flash env push --name <name> <folder>`, then reference the returned id. The
  worker loads it through `freesolo.environments`. There are no built-in task
  environments. Single-turn and bounded multi-turn environments are supported.

## Layout

- `flash/catalog.py` — curated model catalog (Qwen3 dense supported tier;
  Qwen3.5/3.6 experimental tier) + `model_policy = "allow"` VRAM-fit check + each
  model's `thinking` capability (opt-in reasoning mode `thinking = true`)
- `flash/schema.py`, `flash/spec.py` — TOML → `JobSpec`
- `flash/runner.py` — server-side run supervisor (durable job handle,
  retries, cost guard, endpoint GC)
- `flash/providers/` — RunPod Flash + Vast.ai provider subtrees (pricing,
  gpus, durable submit/poll, preflight) behind one `base.Provider` protocol,
  with a cross-provider `allocator.py` that picks the cheapest fitting class
- `flash/engine/` — the on-GPU worker (TRL + colocated vLLM rollouts) and the
  shared recipe; SFT targets and RL rewards route through the active environment
  (task-specific grading lives with its example, not in the engine)
- `flash/envs/` — environment machinery: registry and the adapter that loads
  Freesolo SDK environments onto the worker's interface
- `flash env setup` — scaffold a starter local Freesolo env and a ready-to-run
  config to start from
- `flash/serve/`, `flash/server/` — adapter serving and the FastAPI control
  plane (run operator-side via the separate `flash-server` command)
- `flash/mcp/` — stdio MCP bridge for coding agents
- `Dockerfile` — the control-plane image (used by the repo docker-compose)
- `tests/` — pytest suite (CPU-only; offline-by-default, no GPU/network)

## Local commands

```bash
cd flash
uv sync --extra server
uv run pytest                           # CPU tests (offline-by-default, no GPU/network)
uv run ruff check . && uv run ruff format .
uv run flash --help
uv run flash-server                      # control plane (operator-side, run once)
```

The control plane owns provider credentials: `RUNPOD_API_KEY` is always required
(RunPod is the default substrate), `VAST_API_KEY` is opt-in (only checked when set),
plus the shared `HF_TOKEN`.
The artifact repo is per-run (the run TOML's `[train] hf_repo`), not an
operator-wide env var. Clients authenticate with their freesolo API key (`flash login`).
