Metadata-Version: 2.4
Name: cram-ai
Version: 0.3.0
Summary: Token-waste observability for AI coding agents — measure orientation tax, context bloat, and cache engagement from real transcripts
License-Expression: MIT
Project-URL: Homepage, https://github.com/vishbay/cram-ai
Project-URL: Repository, https://github.com/vishbay/cram-ai
Project-URL: Bug Tracker, https://github.com/vishbay/cram-ai/issues
Keywords: ai,llm,context,token,coding,claude,cursor,copilot
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: anthropic>=0.40.0
Requires-Dist: tomli>=2.0; python_version < "3.11"
Provides-Extra: multi-provider
Requires-Dist: litellm>=1.40.0; extra == "multi-provider"
Provides-Extra: mcp
Requires-Dist: mcp>=1.0.0; extra == "mcp"
Provides-Extra: tui
Requires-Dist: textual>=0.80; extra == "tui"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"

# cram-ai

[![PyPI](https://img.shields.io/pypi/v/cram-ai?color=%237b2fff&style=flat-square)](https://pypi.org/project/cram-ai/)
[![Python](https://img.shields.io/pypi/pyversions/cram-ai?color=%2300f5d4&style=flat-square)](https://pypi.org/project/cram-ai/)
[![License](https://img.shields.io/github/license/vishbay/cram-ai?color=%23f72585&style=flat-square)](LICENSE)

cram prevents expensive exploratory file reads by pre-loading focused project, task, symbol,
decision, and gotcha context before coding starts. Instead of spending the first few exchanges
re-discovering your codebase, your AI tool arrives oriented.

Works with **Claude Code, Cursor, Windsurf, Zed, Codex, GitHub Copilot, Gemini CLI**, and any
tool that reads a file on startup. Custom tool targets are supported via config.

---

## Install

```bash
# Standard — includes MCP server for Claude Code / Cursor / Windsurf / Zed
pip install 'cram-ai[mcp]'

# With TUI dashboard
pip install 'cram-ai[mcp,tui]'

# With additional model providers (OpenAI, Gemini, Bedrock, Ollama …)
pip install 'cram-ai[mcp,multi-provider]'
```

---

## Quick start

```bash
cd your-repo

# 1. One-time setup
cram init
#   → scans your repo, generates ARCHITECTURE.md + SYMBOLS.md via a cheap model
#   → scaffolds DECISIONS.md + GOTCHAS.md for you to fill in
#   → installs a git post-commit hook to keep context fresh

# 2. Fill in the manual files — this is where cram's real value lives
vim .ai-context/DECISIONS.md   # architectural invariants, naming conventions
vim .ai-context/GOTCHAS.md     # non-obvious traps that burned your team

# Or mine your git history for decisions automatically:
cram decisions --mine

# 3. Commit so teammates get the context layer too
git add .ai-context/ CLAUDE.md
git commit -m "chore: init cram-ai context layer"
```

Then wire up your tool of choice — [MCP](#mcp-delivery) or [file-based delivery](#file-based-delivery).

---

## The context layer

cram maintains five files in `.ai-context/`. Two are auto-generated. Two are manual. One is
generated per task.

```
your-repo/
└── .ai-context/
    ├── ARCHITECTURE.md   ← auto  · repo structure, tech stack, key files
    ├── SYMBOLS.md        ← auto  · every source file mapped to its public identifiers
    ├── DECISIONS.md      ← manual · architectural commitments your team has made
    ├── GOTCHAS.md        ← manual · non-obvious traps, foot-guns, things that burn people
    └── CURRENT_TASK.md   ← per-task · focused excerpts for the current work
```

**Auto-generated** (`ARCHITECTURE.md`, `SYMBOLS.md`):
- Generated by `cram init`, refreshed automatically via the git post-commit hook after each commit
- `SYMBOLS.md` is the candidate pool for file selection — every source file mapped to its public
  identifiers via regex. Deterministic, no LLM cost, byte-stable across runs. The LLM picks
  from this index rather than scanning raw source files, so it stays grounded in real symbols.
- `ARCHITECTURE.md` uses a cheap model (Haiku / Gemini Flash / GPT-4o Mini)

**Manual** (`DECISIONS.md`, `GOTCHAS.md`):
- Scaffolded by `cram init` — you fill them in over time
- `DECISIONS.md`: "we use X", "never do Y", naming conventions, non-obvious invariants
- `GOTCHAS.md`: silent side effects, middleware gaps, surprising nulls — things grep can't tell you
- Append entries with `cram decide "..."`, `cram gotcha "..."`, or mine git history with `cram decisions --mine`
- Agents can propose decisions mid-session using the `propose_decision` MCP tool

**Output protection by default:**
Every file cram generates for file-based targets includes command output protection rules.
Your agent won't accidentally read 80 KB of build output mid-session:

- Unknown commands are byte-capped to 6,000 bytes by default (`COMMAND 2>&1 | head -c 6000`)
- File inspection uses `head`/`tail`, never raw `cat`
- Large outputs go to a temp file you inspect in ranges
- Configurable in `.ai-context/config.toml` under `[output]` (`byte_cap`, `line_cap`, `temp_file`)

---

**Per-task** (`CURRENT_TASK.md`):
When you call `get_context("task description")`, cram runs a four-stage pipeline:

1. **Symbol index** — reads `SYMBOLS.md` (every public identifier, by file) as the candidate pool
2. **LLM selection** — a cheap model picks relevant files from the symbol index and names key identifiers
3. **Excerpt extraction** — pulls identifier-focused excerpts from selected files (not full files)
4. **Context write** — assembles into `CURRENT_TASK.md` (~800–1,500 tokens)

> **What this replaces:** the agent spending 3–5 tool calls grep-ing and reading files to orient
> itself at the start of every session. With cram the context arrives in one call and includes
> knowledge — decisions, gotchas — that the agent can't discover by searching.

---

## MCP delivery

If your tool supports MCP (Claude Code, Cursor, Windsurf, Zed, Codex CLI), wire up the cram MCP
server once and the tool can call context tools directly.

**One-time server config** (same format for all MCP clients):

```json
{
  "mcpServers": {
    "cram-ai": {
      "command": "cram",
      "args": ["mcp", "--repo", "/absolute/path/to/your-repo"]
    }
  }
}
```

| Client | Config file |
|---|---|
| Claude Code | `.mcp.json` at repo root |
| Cursor | `.cursor/mcp.json` or Cursor Settings → MCP |
| Windsurf | Windsurf MCP settings |
| Zed | Zed assistant settings → context servers |
| Codex CLI | `~/.codex/config.yaml` → `mcpServers` |

**Available MCP tools:**

| Tool | What it returns | When to call it |
|---|---|---|
| `get_context(task='')` | Runs symbol lookup → file selection → excerpt extraction. No-arg: returns last CURRENT_TASK.md without re-running the LLM. Prepends a staleness warning when context is stale or critical. | First thing every session |
| `get_architecture()` | ARCHITECTURE.md — repo structure, tech stack, key files | Orientation in an unfamiliar area |
| `get_symbols(query='')` | SYMBOLS.md — source files mapped to public identifiers, optionally filtered | Finding where a function is defined |
| `get_decisions()` | DECISIONS.md — architectural commitments | Before making a design choice |
| `get_gotchas()` | GOTCHAS.md — non-obvious traps and foot-guns | Before touching an unfamiliar area |
| `propose_decision(text, reason='', alternatives='')` | Appends a `[PENDING]` entry to DECISIONS.md for owner review. Logs to `suggestions.jsonl` for `cram ui`. | When you make an architectural choice worth recording |
| `add_file(path, identifiers='')` | Appends a file's excerpts to CURRENT_TASK.md | When a mid-task discovery needs new context |
| `get_health()` | Deterministic markdown: staleness score (0–10), commits since last sync, per-file token counts vs soft budgets. Safe to cache. | Before trusting loaded context on a long-running branch |

---

## Agent write-back

Agents can propose decisions directly to DECISIONS.md without leaving their session:

```
propose_decision(
  text="use JWT over session cookies",
  reason="stateless — scales horizontally without a session store",
  alternatives="redis session store, cookie-based sessions"
)
```

This appends a `[PENDING]` entry. The entry is surfaced in `cram ui` where you approve or
discard it with a single keystroke. Nothing is written to the canonical record without owner
review.

---

## File-based delivery

For tools that don't support MCP, run `cram task "..." --target <tool>` before your session.
cram writes focused context into the file the tool auto-loads at startup.

```bash
# GitHub Copilot
cram task "add pagination to the users endpoint" --target copilot
# → writes to .github/cram-task.md (one-time: add an include line to copilot-instructions.md)

# Cursor (no-MCP fallback)
cram task "add pagination to the users endpoint" --target cursor
# → writes to .cursor/rules/cram-task.md

# Gemini CLI
cram task "refactor the auth module" --target gemini
# → writes to GEMINI.md (marker-based, preserves your content outside cram's section)

# All targets at once
cram task "add pagination to the users endpoint" --target all
```

| Target | File written |
|---|---|
| `cursor` | `.cursor/rules/cram-task.md` |
| `windsurf` | `.windsurf/rules/cram-task.md` |
| `copilot` | `.github/cram-task.md` |
| `codex` | `AGENTS.md` (repo root) |
| `gemini` | `GEMINI.md` (repo root, marker-based upsert) |
| `claude` | `CLAUDE.md` (escape hatch for Claude Code; prefer MCP) |
| `all` | All detected targets |

**Custom targets** — for tools with non-standard instruction files (e.g., an enterprise IDE with
`ACME.md`), add a section to `.ai-context/config.toml`:

```toml
[targets.acme]
file      = "ACME.md"
indicator = "acme.config.json"   # optional: file/dir that signals tool is active
upsert    = true                 # optional: use cram markers to preserve your content
```

Then `cram task "..." --target acme` works like any built-in target.

**Enterprise gateways** — for internal model proxies that use SSO tokens instead of API keys,
add to `~/.config/cram-ai/settings.json`:

```json
{
  "proxy": {
    "base_url": "https://gateway.corp/v1",
    "headers": { "X-Corp-Token": "your-sso-token" }
  }
}
```

cram passes `base_url` as `api_base`, `headers` as `extra_headers`, and uses a dummy
`api_key` so litellm doesn't abort. No API key needed.

---

## Decisions mining

DECISIONS.md is the hardest file to keep current — it depends entirely on human discipline.
`cram decisions --mine` automates the first draft:

```bash
cram decisions --mine          # scan last 90 days of git history
cram decisions --mine --days 180
```

It scans git log for decision-shaped language, runs a cheap model to extract structured
entries, then walks you through them one at a time (`git add -p` style):

```
── Draft 1/3 ──────────────────────────────────────
  Decision: use JWT over session cookies
  Reason:   reduces server-side state, scales horizontally
  [a]ccept  [s]kip  [e]dit  [q]uit >
```

Accepted entries are appended to DECISIONS.md immediately.

---

## TUI dashboard

`cram ui` opens a Textual terminal dashboard:

```bash
pip install 'cram-ai[tui]'
cram ui
```

Six tabs:

- **Audit** (default) — the orientation-tax numbers for the last 30 days: reads before
  first edit, read-to-edit ratio with band, cache writes/reads per session, a
  cache-engagement check (sessions that wrote cache but never read it back),
  context-bloat metrics (context per request, read-cost tail share, carried cost of
  oversized tool results, redundant re-reads), and a weekly trend of the primary
  metric. The dashboard opens on the number, not the knobs.
- **Decisions** — pending agent proposals at top, accepted history below.
  Press `a` to approve the focused entry, `d` to delete it. Badge shows pending count.
- **Sessions** — recent Claude Code sessions with reads, edits, read-to-edit ratio, and
  the **task that was active** during each session. Task names are inferred from
  `TASK_HISTORY.jsonl` by matching each session's timestamp against task time windows.
  Ratio > 5× is flagged — context isn't landing for those sessions.
- **Health** — staleness score, commits since last sync, per-file token budgets.
- **History** — recent `cram task` invocations with timestamps. The **active session
  task** is shown at the top in green (it lives in `session.json` and hasn't been
  archived yet). This is a recall aid ("what was I working on last Tuesday?"), not a
  project management tool.
- **Actions** — run `cram sync`, `cram task`, `cram benchmark`, or `cram doctor` from
  inside the TUI. An animated progress bar shows while a command is running.

Each tab refreshes its data when you switch to it. Auto-refreshes every 30 seconds.
`r` forces a full refresh, `q` quits.

---

## Orientation tax measurement

`cram audit` measures how much of each session is spent on navigation vs. actual work:

```bash
cram audit           # last 30 days for this repo
cram audit --days 7  # tighter window
cram audit --all     # all projects
cram audit --json    # structured output for dashboards / scripts
cram audit --compare PATH_A PATH_B   # side-by-side A/B of two checkouts
```

`--compare` prints both checkouts' metrics with deltas — built for attribution
experiments: keep one checkout wired with cram, one plain, alternate tasks between
them, and compare after a couple of weeks.

Output includes:

```
Avg reads before first edit:    8.2  ← primary metric
Avg edits/session:              3.1
Avg read-to-edit ratio:         2.6×  ~ normal
Cache engagement:               18/24 sessions read from cache

Ratio guide: < 2× good · 2–5× normal · > 5× context isn't landing
```

The cache-engagement line is the silent-failure check: a session that wrote cache but
never read it back paid the 1.25× write price for nothing — a signal that prompt
caching isn't engaging (prefix instability, sub-floor prefix, or a misconfigured proxy).

The report also measures **context bloat** — usually the largest waste bucket: average
context re-read per request, the share of read-cost in the final third of turns
(33% = flat; higher means the context is growing), the *carried cost* of oversized
tool results (a big result entering at turn k is re-read by every later turn), and
redundant same-file reads. Thresholds: `CRAM_AUDIT_BIG_RESULT_BYTES` (default 20000).

**Retry loops** are reported when present: failed tool calls per session (`is_error`
tool results — each usually means a retry follows) and same-file re-edits per session
(a couple is normal; sustained churn means the agent is thrashing).

Dollar attribution is **provider-pluggable**: set `CRAM_PROVIDER` to `anthropic`
(default), `openai`, `gemini`, or `local` (zero-dollar — the cost is latency). Prices
are representative defaults; override per field with `CRAM_PRICE_INPUT_PER_MTOK`,
`CRAM_CACHE_WRITE_MULT`, `CRAM_CACHE_READ_MULT` for billing-grade numbers.

Run before and after adopting cram to measure the reduction.

---

## Daily workflow

```bash
# Before a session — MCP path (Claude Code / Cursor / Windsurf / Zed)
# Nothing to run. The agent calls get_context() itself.

# Before a session — file-based delivery (Copilot / no-MCP tools)
cram task "fix the rate limiter" --target copilot

# Log a decision while working
cram decide "use cursor-based pagination, not offset — offset breaks under concurrent writes"

# Mine git history for past decisions
cram decisions --mine

# Log a gotcha you just found
cram gotcha "the users.email column is nullable in prod despite NOT NULL in schema.prisma"

# Extend grace period if you commit mid-task (prevents context reset)
cram continue

# Check context freshness
cram status

# Review pending agent proposals + session efficiency
cram ui
```

After every commit the git post-commit hook runs `cram sync` automatically to refresh
`ARCHITECTURE.md` and `SYMBOLS.md`. A session grace period prevents sync from firing while
you're mid-task.

---

## Context health

cram tracks how stale your context is with a **0–10 staleness score** derived from git — the
number of commits on HEAD since ARCHITECTURE.md was last regenerated. No new state files; the
score is always correct after a teammate pull.

The post-commit hook writes ARCHITECTURE.md to disk but does not commit it. The health check
detects this correctly: if the file has uncommitted changes (i.e., it was rewritten by `cram sync`
after the last commit), the score is reported as 0 — not stale.

| Score | Band | Meaning |
|---|---|---|
| 0–2 | `fresh` | Up to date — work freely |
| 3–5 | `acceptable` | Drifting slightly — fine to continue |
| 6–7 | `stale` | Update before next session |
| 8–10 | `critical` | Sync now — context may mislead |

The score falls back to an mtime check when git is unavailable. The critical threshold defaults to
10 commits and is tunable via `CRAM_STALE_CRITICAL_COMMITS`.

**Where health surfaces:**
- `cram status` — per-file age table + health line with score, band, and commit count
- `cram ui` → Health tab — staleness score + per-file token budgets + active task slots
- Tray badge — shows band + score (`stale 6/10`) with band-appropriate color
- `get_health()` MCP tool — deterministic markdown block the agent can call before trusting context
- `get_context()` — prepends a one-line staleness warning when band is `stale` or `critical`
- `cram sync` — warns to stderr after regenerating if any frozen file exceeds its soft token budget

**Soft token budgets** (warnings only — nothing is ever truncated):

| File | Default budget | Override |
|---|---|---|
| `ARCHITECTURE.md` | 2,000 tok | `CRAM_BUDGET_ARCHITECTURE` |
| `DECISIONS.md` | 600 tok | `CRAM_BUDGET_DECISIONS` |
| `GOTCHAS.md` | 400 tok | `CRAM_BUDGET_GOTCHAS` |
| `CURRENT_TASK.md` | 800 tok | `CRAM_BUDGET_TASK` |
| `SYMBOLS.md` | no budget | scales with repo size |

---

## Team and concurrency

cram is designed for **one developer, one repo checkout**. Context files live in
`.ai-context/` which is typically gitignored, so each developer works with their own
independent context.

| Scenario | Works? |
|---|---|
| One developer, one agent | Yes — designed for this |
| One developer, parallel agents (same session) | Yes — each `get_context()` call gets its own slot under `.ai-context/tasks/` |
| Multiple developers, separate checkouts | Independent — no sharing or coordination |
| Multiple developers wanting shared context | Not supported |

The slot system protects against concurrent MCP calls within a single server process. It is
**not** a collaboration feature — there is no shared state, sync, or conflict resolution
across different checkouts or machines.

---

## CLI reference

| Command | What it does |
|---|---|
| `cram init [path] [--team]` | One-time setup — scans repo, generates context files, installs git hook |
| `cram mcp [--repo PATH]` | Start MCP server (stdio). Wire into your tool's settings once; clients launch it automatically. |
| `cram task "..." [--target T]` | Run context pipeline, write CURRENT_TASK.md, optionally inject into tool's auto-loaded file |
| `cram decisions [--mine] [--days N]` | Show DECISIONS.md, or mine git history for decision-shaped commits and review interactively |
| `cram sync [path]` | Refresh ARCHITECTURE.md + SYMBOLS.md from current repo state. If the session grace period has expired, archives the current task to `TASK_HISTORY.jsonl` and resets the task context in all target files (your instructions are untouched — only the cram-managed task section is cleared). |
| `cram decide "..." [path]` | Append a dated architectural decision to DECISIONS.md |
| `cram gotcha "..." [path]` | Append a non-obvious trap to GOTCHAS.md |
| `cram continue [path]` | Extend grace period — keep context across a mid-task commit |
| `cram status [path]` | Show each context file with age, line count, and token budget status |
| `cram audit [--days N] [--all] [--json]` | Measure reads-before-edit, read-to-edit ratio, and cache engagement from Claude Code transcripts |
| `cram ui [path]` | TUI dashboard — pending decisions, session efficiency, context health (requires `cram-ai[tui]`) |
| `cram benchmark [path]` | Show token and cost comparison across delivery strategies |
| `cram doctor [path]` | Health check — models, hooks, git, context files |
| `cram hook install\|uninstall` | Manage the git post-commit hook manually |

---

## Model providers

cram uses a cheap model for its maintenance calls (generating ARCHITECTURE.md, selecting files,
extracting excerpts). Set `AICONTEXT_MODEL` to any provider:

```bash
# Inside Claude Code — zero config, uses session credentials
cram init

# Anthropic API key
export ANTHROPIC_API_KEY=sk-...
export AICONTEXT_MODEL=anthropic/claude-haiku-4-5

# Google Gemini
export GEMINI_API_KEY=...
export AICONTEXT_MODEL=gemini/gemini-2.0-flash

# OpenAI
export OPENAI_API_KEY=sk-...
export AICONTEXT_MODEL=openai/gpt-4o-mini

# Ollama (local, free, no key needed)
export AICONTEXT_MODEL=ollama/mistral
cram init
```

Also supports: AWS Bedrock, GCP Vertex AI, Azure OpenAI, custom LiteLLM proxies with
`proxy.base_url` + `proxy.headers` (install `cram-ai[multi-provider]`).

---

## Environment variables

| Variable | Default | Description |
|---|---|---|
| `AICONTEXT_MODEL` | auto-detected | Model for context tasks — bare alias (`haiku`) or `provider/model` |
| `ANTHROPIC_API_KEY` | — | Optional inside Claude Code (uses session credentials) |
| `AICONTEXT_MAX_FILES` | `5` | Max files included in CURRENT_TASK.md per task |
| `AICONTEXT_MAX_LINES` | `300` | Max lines per file when extracting excerpts |
| `AICONTEXT_TASKS_PER_SESSION` | `4` | Assumed tasks per cache window (used by `cram benchmark`) |
| `CRAM_TASK_GRACE_SECONDS` | `600` | Seconds after `cram task` before a commit resets context |
| `CRAM_STALE_CRITICAL_COMMITS` | `10` | Commits since last sync that maps to staleness score 10 (critical). Lower = more sensitive. |
| `CRAM_BUDGET_ARCHITECTURE` | `2000` | Soft token budget for ARCHITECTURE.md — warns in `cram status` and `cram sync` when exceeded |
| `CRAM_BUDGET_DECISIONS` | `600` | Soft token budget for DECISIONS.md |
| `CRAM_BUDGET_GOTCHAS` | `400` | Soft token budget for GOTCHAS.md |
| `CRAM_BUDGET_TASK` | `800` | Soft token budget for CURRENT_TASK.md |
| `CRAM_PROVIDER` | `anthropic` | Pricing table for audit dollar attribution: `anthropic` / `openai` / `gemini` / `local` |
| `CRAM_PRICE_INPUT_PER_MTOK` | per provider | Override base input price ($/1M tokens) for audit cost estimates |
| `CRAM_CACHE_WRITE_MULT` | per provider | Override cache-write multiplier (1.25 on Anthropic) |
| `CRAM_CACHE_READ_MULT` | per provider | Override cache-read multiplier (0.10 on Anthropic) |
| `CRAM_AUDIT_TOK_PER_FILE` | `2500` | Assumed tokens per orientation file read in `cram audit` cost modeling |
| `CRAM_AUDIT_BIG_RESULT_BYTES` | `20000` | Serialized size above which a tool result counts as oversized in `cram audit` |

---

## 💰 Real-world token consumption

Without context pre-loading, an agent spends the first few exchanges of every session
re-discovering the codebase — reading files, running searches, building orientation from scratch.

**What a typical session consumes (no cram):**

| Phase | What happens | Tokens |
|---|---|---|
| Session start | System prompt + tool definitions + rules files | 3–8K |
| Orientation | `find` / `grep` / `read` calls to discover relevant files cold | 20–60K |
| Active work | Conversation, edits, test runs | 20–50K |
| Output | Code written, explanations | 5–15K |
| **Per task total** | | **50–130K** |

The orientation phase is pure re-discovery overhead — the agent reads roughly 8 files cold
each session to figure out where to work. cram replaces that with one `get_context()` call
returning ~1–2K tokens of targeted excerpts.

You can measure your actual overhead with `cram audit`. The read-to-edit ratio — reads before
first edit divided by total edits — tells you how much of each session is navigation vs. work.
Ratio > 5× means context isn't landing.

**What cram removes:**

cram eliminates the cold-start orientation overhead per session. It does **not** replace
the agent's productive reads (edits, tests, active work). The savings are orientation-only,
not total-session savings.

Run `cram benchmark` for a full token and cost breakdown across all three delivery strategies
and model tiers. Run `cram audit` to measure your actual read-to-edit ratio.

---

<details>
<summary>💸 <strong>Claude Code users: cache-write bonus</strong></summary>

This section is specific to Claude Code + Anthropic. The context layer is useful for any tool,
but Claude's prompt caching gives MCP delivery an additional cost advantage.

Anthropic's prompt cache has a 5-minute TTL. Content in the conversation **prefix** gets
cache-written at 1.25× the base input price on every new session and every TTL expiry.
Content that doesn't touch the prefix — like MCP tool results — isn't.

**File-based delivery vs MCP:**

| | File-based delivery (`--target claude`) | MCP (`get_context()`) |
|---|---|---|
| Where context lands | CLAUDE.md → front of prefix | Conversation tail (tool result) |
| Cache writes per session | N × task context tokens | 1 × tool definitions (~1–2K tokens) |
| Per-task context cost | 1.25× write per task change | 0.1× read after first session write |
| 10K-token context, 4 tasks | ~$0.09–0.15 in cache writes | ~$0.01 in cache writes |

The larger your context and the more tasks per session, the more the MCP path saves.

Run `cram benchmark` to model the exact numbers for your repo.

**The floor check:** the frozen prefix must exceed 2,048 tokens (Sonnet 4.6) or 4,096 tokens
(Opus 4.8 / Haiku 4.5) to cache at all. `cram benchmark` flags this if your context files are
below the threshold.

</details>

---

## Running tests

```bash
pip install pytest
pytest
```

No API key required — all model calls are mocked.

---

## License

MIT
