Metadata-Version: 2.4
Name: model-preflight
Version: 0.1.8
Summary: Preflight checks for LLM prototypes.
Project-URL: Homepage, https://github.com/pylit-ai/model-preflight
Project-URL: Repository, https://github.com/pylit-ai/model-preflight
Project-URL: Issues, https://github.com/pylit-ai/model-preflight/issues
Author: ModelPreflight contributors
License-Expression: Apache-2.0
Keywords: evals,litellm,llm,prototypes,routing,smoke-tests
Requires-Python: >=3.11
Requires-Dist: litellm>=1.75
Requires-Dist: platformdirs>=4.3
Requires-Dist: pydantic-settings>=2.4
Requires-Dist: pydantic<3,>=2.7
Requires-Dist: python-dotenv>=1.2
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.7
Requires-Dist: typer>=0.16
Provides-Extra: dev
Requires-Dist: mypy>=1.17; extra == 'dev'
Requires-Dist: pytest>=8.4; extra == 'dev'
Requires-Dist: ruff>=0.12; extra == 'dev'
Provides-Extra: keyring
Requires-Dist: keyring>=25; extra == 'keyring'
Description-Content-Type: text/markdown

# ModelPreflight

**Preflight checks for LLM prototypes.**

A tiny local gateway for LLM smoke tests, provider failover, and cheap prototype checks before you wire an LLM into something bigger.

[![CI](https://github.com/pylit-ai/model-preflight/actions/workflows/ci.yml/badge.svg)](https://github.com/pylit-ai/model-preflight/actions/workflows/ci.yml)
[![Python versions](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://www.python.org/downloads/)
[![PyPI version](https://img.shields.io/pypi/v/model-preflight?label=PyPI)](https://pypi.org/project/model-preflight/)
![License](https://img.shields.io/badge/license-Apache--2.0-lightgrey.svg)
![LiteLLM](https://img.shields.io/badge/router-LiteLLM-informational)

<img src="https://raw.githubusercontent.com/pylit-ai/model-preflight/main/docs/assets/hero.png" alt="ModelPreflight hero image" width="900">

| If you want to... | Start here |
|-------------------|------------|
| Get one green check quickly | [60-second start](#60-second-start) |
| Try it without keys | [No-key demo path](#no-key-demo-path) |
| Configure provider groups once | [Machine-local config](#machine-local-config) |
| Run project smoke cases | [Smoke tests](#smoke-tests) |
| Fan out a one-off prompt | [Pro Mode](#pro-mode) |
| Use it as a Python helper | [Library usage](#library-usage) |

ModelPreflight keeps provider setup **machine-local** and keeps smoke cases **project-local**. It gives prototypes stable model-group aliases, simple failover, and JSONL audit logs without becoming a benchmark harness or hosted gateway.

## 60-second start

```bash
uvx model-preflight --help

# In a persistent tool or project environment:
uv tool install model-preflight
# or:
pipx install model-preflight
```

Pick one provider, set one key, and run one live check:

```bash
mpf init --provider nvidia
export NVIDIA_NIM_API_KEY=...
mpf doctor --live
mpf demo
```

Expected signal:

- `mpf init --provider nvidia` writes your machine-local config and prints `next: mpf doctor --live`.
- `mpf doctor --live` prints a deployments table, then `live check ok: group=...`.
- `mpf demo` prints JSON with `"passed": true` and an empty `"failures": []` list.

Add checks to a project:

```bash
cd my-project
mpf init-project
mpf run
```

Expected signal:

- `mpf init-project` writes `evals/smoke.jsonl`, writes `.model-preflight/README.md`, and updates `.gitignore`.
- `mpf run` prints JSON results for the starter cases. Every passing case has `"passed": true`.
- A failing case exits non-zero and includes strings under `"failures"` so you know what drifted.

Both `mpf` and `model-preflight` are installed as console scripts.

ModelPreflight catches missing keys, broken provider routes, prompt formatting regressions, output-shape drift, accidental model/provider changes, and "this worked yesterday" prototype failures before you wire the LLM call into something larger.

## No-key demo path

Use the minimal offline preset when you want to test the CLI and project workflow without a provider account:

```bash
mpf init --preset minimal
mpf doctor --live
mpf demo
mpf init-project
mpf run
```

What this proves:

- Config loading works without secrets.
- The CLI can run a live-style check through the offline echo provider.
- Project bootstrap works by creating `evals/smoke.jsonl`.
- Smoke scoring works when `mpf run` returns JSON where every case has `"passed": true`.

What it does not prove: remote provider auth, quota, latency, or model quality. Use the OpenRouter path below for that.

## Machine-local config

ModelPreflight reads provider routes from `~/.config/model-preflight/config.yaml` by default. Override the path with either `--config` or `MODEL_PREFLIGHT_CONFIG`.

```bash
mpf init --provider openrouter
mpf doctor
mpf models
```

Provider setup is discoverable from the CLI:

```bash
mpf providers list
mpf providers guide nvidia
mpf providers guide openrouter
mpf providers test nvidia
mpf providers test openrouter
```

NVIDIA Build / NIM is the primary high-capability open/open-weight endpoint option. OpenRouter is still the lowest-friction discovery option because one API key can route to many model providers through an OpenAI-compatible API.

Use either primary path:

```bash
mpf init --provider nvidia
export NVIDIA_NIM_API_KEY=...
mpf doctor --provider nvidia --live

mpf init --provider openrouter
export OPENROUTER_API_KEY=...
mpf doctor --provider openrouter --live
```

| Provider | Best for | Env var | Setup |
|----------|----------|---------|-------|
| NVIDIA Build / NIM | Primary high-capability open/open-weight endpoint pool | `NVIDIA_NIM_API_KEY` | [API keys](https://build.nvidia.com/settings/api-keys) |
| OpenRouter | One-key first run with broad model access | `OPENROUTER_API_KEY` | [Authentication docs](https://openrouter.ai/docs/api-reference/authentication) |
| Groq | Fast repeated calls after first-run setup works | `GROQ_API_KEY` | [Groq console](https://console.groq.com/keys) |
| Cerebras | Fast inference experiments when current dev-tier limits fit | `CEREBRAS_API_KEY` | [Cerebras inference docs](https://inference-docs.cerebras.ai/) |
| Mistral | First-party Mistral model-family smoke checks | `MISTRAL_API_KEY` | [Mistral API keys](https://docs.mistral.ai/getting-started/quickstart/#account-setup) |

Secondary/overflow pool to add manually once the primary pool works: Google Gemini/Gemma, Cloudflare Workers AI, GitHub Models, Hugging Face Inference Providers, and SambaNova. These are documented in `docs/PROVIDER_PRESETS.md`, but not packaged as first-run presets yet because auth shape, model IDs, and free/dev limits are more account-specific.

The default config creates logical groups, then maps each group to one or more LiteLLM deployments:

```yaml
router:
  num_retries: 1
  timeout_seconds: 60
  default_group: free_reasoning
  audit_jsonl: null
artifacts_dir: ~/.cache/model-preflight/artifacts

deployments:
  - name: nvidia_nim_nemotron_3_super
    provider: nvidia
    group: free_reasoning
    model: nvidia_nim/nvidia/nemotron-3-super-120b-a12b
    api_key_env: NVIDIA_NIM_API_KEY
    enabled: true
    required: true
    status: best_effort
    setup_url: https://build.nvidia.com/settings/api-keys
    rpm: 10
    tier: reasoning
```

Provider presets are best-effort starter data, not authoritative claims about free availability. User-local config wins over bundled defaults, optional/disabled providers do not block first-run checks, and endpoint names, quotas, pricing, and behavior can change without this package knowing.

## Smoke tests

Smoke cases are JSONL files owned by the project that is doing the prototype work.

```jsonl
{"id":"basic-ok","prompt":"Return only: ok","expected_substrings":["ok"]}
{"id":"avoid-word","prompt":"Answer yes without using the word nope","forbidden_substrings":["nope"]}
```

Run them with:

```bash
mpf run
# or:
mpf run path/to/smoke_cases.jsonl
```

`mpf run` prints JSON results and exits non-zero if any case fails.

## Pro Mode

`mpf pro` fans out a one-off prompt, then synthesizes a final answer through a judge group.

```bash
mpf pro "Suggest three robust JSON schemas for this toy extraction task" --n 8
```

Defaults:

| Option | Default | Role |
|--------|---------|------|
| `--n` | `8` | number of sampled answers |
| `--sample-group` | `free_fast` | fanout group |
| `--judge-group` | `free_reasoning` | synthesis group |

Fanout multiplies live provider calls. Keep `--n` low while testing, use restricted provider keys where available, and review provider dashboards when running against paid endpoints.

## Library usage

```python
from model_preflight import ModelGateway, load_config, pro_mode

gateway = ModelGateway(load_config())

print(gateway.text("Return only: ok", group="free_reasoning"))

result = pro_mode(gateway, "Solve this toy puzzle", n=8)
print(result["final"])
```

The library API is intentionally thin:

- `load_config()` reads the same machine-local config as the CLI
- `ModelGateway` wraps LiteLLM Router with stable group aliases and audit logging
- `pro_mode()` runs fanout plus synthesis for one-off prototype prompts

## Audit artifacts

By default, ModelPreflight writes audit logs under:

```text
~/.cache/model-preflight/artifacts/audit.jsonl
```

Each live call should be traceable enough to debug provider drift: timestamp, logical group, resolved provider/model when available, prompt or case metadata, latency, token usage when available, and response id when available.

## Non-goals

ModelPreflight is not a model leaderboard, a formal benchmark framework, a hosted inference gateway, a provider catalog authority, or proof that an endpoint is free, fast, or available today.
