Metadata-Version: 2.4
Name: cram-ai
Version: 0.2.1
Summary: Stable context layer for AI coding tools — Haiku-generated, delivered via MCP to keep the prefix tiny
License-Expression: MIT
Project-URL: Homepage, https://github.com/vishbay/cram-ai
Project-URL: Repository, https://github.com/vishbay/cram-ai
Project-URL: Bug Tracker, https://github.com/vishbay/cram-ai/issues
Keywords: ai,llm,context,token,coding,claude,cursor,copilot
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: anthropic>=0.40.0
Requires-Dist: tomli>=2.0; python_version < "3.11"
Provides-Extra: multi-provider
Requires-Dist: litellm>=1.40.0; extra == "multi-provider"
Provides-Extra: tray
Requires-Dist: pystray>=0.19.0; extra == "tray"
Requires-Dist: pillow>=10.0; extra == "tray"
Requires-Dist: pywebview>=5.0; extra == "tray"
Requires-Dist: flask>=3.0; extra == "tray"
Provides-Extra: mcp
Requires-Dist: mcp>=1.0.0; extra == "mcp"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"

# cram-ai

[![PyPI](https://img.shields.io/pypi/v/cram-ai?color=%237b2fff&style=flat-square)](https://pypi.org/project/cram-ai/)
[![Python](https://img.shields.io/pypi/pyversions/cram-ai?color=%2300f5d4&style=flat-square)](https://pypi.org/project/cram-ai/)
[![License](https://img.shields.io/github/license/vishbay/cram-ai?color=%23f72585&style=flat-square)](LICENSE)

> **Your AI coding tool starts fresh every session. cram gives it memory.**

cram maintains a curated context layer for your repo — architecture, symbols, decisions, known
gotchas, and focused file excerpts. Your AI tool loads exactly what it needs instead of
re-discovering your codebase from scratch every time.

Works with **Claude Code, Cursor, Windsurf, Zed, Codex, GitHub Copilot**, and any tool that
reads a file on startup.

---

## Install

```bash
# Standard — includes MCP server for Claude Code / Cursor / Windsurf / Zed
pip install 'cram-ai[mcp]'

# With macOS menu bar app
pip install 'cram-ai[mcp,tray]'

# With additional model providers (OpenAI, Gemini, Bedrock, Ollama …)
pip install 'cram-ai[mcp,multi-provider]'

# Homebrew (macOS)
brew tap vishbay/cram-ai && brew install cram-ai
```

---

## Quick start

```bash
cd your-repo

# 1. One-time setup
cram init
#   → scans your repo, generates ARCHITECTURE.md + SYMBOLS.md via a cheap model
#   → scaffolds DECISIONS.md + GOTCHAS.md for you to fill in
#   → installs a git post-commit hook to keep context fresh

# 2. Fill in the manual files — this is where cram's real value lives
vim .cram-ai-context/DECISIONS.md   # architectural invariants, naming conventions
vim .cram-ai-context/GOTCHAS.md     # non-obvious traps that burned your team

# 3. Commit so teammates get the context layer too
git add .cram-ai-context/ CLAUDE.md
git commit -m "chore: init cram-ai context layer"
```

Then wire up your tool of choice — [MCP](#mcp-delivery) or [prefix injection](#prefix-injection).

---

## The context layer

cram maintains five files in `.cram-ai-context/`. Two are auto-generated. Two are manual. One is
generated per task.

```
your-repo/
└── .cram-ai-context/
    ├── ARCHITECTURE.md   ← auto  · repo structure, tech stack, key files
    ├── SYMBOLS.md        ← auto  · every source file mapped to its public identifiers
    ├── DECISIONS.md      ← manual · architectural commitments your team has made
    ├── GOTCHAS.md        ← manual · non-obvious traps, foot-guns, things that burn people
    └── CURRENT_TASK.md   ← per-task · focused excerpts for the current work
```

**Auto-generated** (`ARCHITECTURE.md`, `SYMBOLS.md`):
- Generated by `cram init`, refreshed automatically via the git post-commit hook after each commit
- `SYMBOLS.md` uses regex — deterministic, no LLM cost, byte-stable across runs
- `ARCHITECTURE.md` uses a cheap model (Haiku / Gemini Flash / GPT-4o Mini)

**Manual** (`DECISIONS.md`, `GOTCHAS.md`):
- Scaffolded by `cram init` — you fill them in over time
- `DECISIONS.md`: "we use X", "never do Y", naming conventions, non-obvious invariants
- `GOTCHAS.md`: silent side effects, middleware gaps, surprising nulls — things grep can't tell you
- Append entries with `cram decide "..."` or `cram gotcha "..."`

**Per-task** (`CURRENT_TASK.md`):
When you call `get_context("task description")`, cram runs a four-stage pipeline: reads the
symbol index, asks a cheap model to identify relevant files, extracts identifier-focused excerpts
from those files, and writes the result. Typically 800–1,500 tokens covering exactly the files
that matter for the task.

> **What this replaces:** the agent spending 3–5 tool calls grep-ing and reading files to orient
> itself at the start of every session. With cram the context arrives in one call and includes
> knowledge — decisions, gotchas — that the agent can't discover by searching.

---

## MCP delivery

If your tool supports MCP (Claude Code, Cursor, Windsurf, Zed, Codex CLI), wire up the cram MCP
server once and the tool can call context tools directly.

**One-time server config** (same format for all MCP clients):

```json
{
  "mcpServers": {
    "cram-ai": {
      "command": "cram",
      "args": ["mcp", "--repo", "/absolute/path/to/your-repo"]
    }
  }
}
```

| Client | Config file |
|---|---|
| Claude Code | `.claude/settings.json` |
| Cursor | `.cursor/mcp.json` or Cursor Settings → MCP |
| Windsurf | Windsurf MCP settings |
| Zed | Zed assistant settings → context servers |
| Codex CLI | `~/.codex/config.yaml` → `mcpServers` |

**Available MCP tools:**

| Tool | What it returns | When to call it |
|---|---|---|
| `get_context(task='')` | Runs symbol lookup → file selection → excerpt extraction. No-arg: returns last CURRENT_TASK.md without re-running the LLM. | First thing every session |
| `get_architecture()` | ARCHITECTURE.md — repo structure, tech stack, key files | Orientation in an unfamiliar area |
| `get_symbols(query='')` | SYMBOLS.md — source files mapped to public identifiers, optionally filtered | Finding where a function is defined |
| `get_decisions()` | DECISIONS.md — architectural commitments | Before making a design choice |
| `get_gotchas()` | GOTCHAS.md — non-obvious traps and foot-guns | Before touching an unfamiliar area |
| `add_file(path, identifiers='')` | Appends a file's excerpts to CURRENT_TASK.md | When a mid-task discovery needs new context |

---

## Prefix injection

For tools that don't support MCP, run `cram task "..." --target <tool>` before your session.
cram writes focused context into the file the tool auto-loads at startup.

```bash
# GitHub Copilot
cram task "add pagination to the users endpoint" --target copilot
# → writes to .github/cram-task.md (one-time: add an include line to copilot-instructions.md)

# Cursor (no-MCP fallback)
cram task "add pagination to the users endpoint" --target cursor
# → writes to .cursor/rules/cram-task.md

# Windsurf (no-MCP fallback)
cram task "add pagination to the users endpoint" --target windsurf
# → writes to .windsurf/rules/cram-task.md

# All targets at once
cram task "add pagination to the users endpoint" --target all
```

| Target | File written |
|---|---|
| `cursor` | `.cursor/rules/cram-task.md` |
| `windsurf` | `.windsurf/rules/cram-task.md` |
| `copilot` | `.github/cram-task.md` |
| `codex` | `.cram-ai-context/AGENTS.md` |
| `claude` | `CLAUDE.md` (escape hatch; prefer MCP for Claude Code) |
| `all` | All of the above |

---

## Daily workflow

```bash
# Before a session — MCP path (Claude Code / Cursor / Windsurf / Zed)
# Nothing to run. The agent calls get_context() itself.

# Before a session — injection path (Copilot / no-MCP tools)
cram task "fix the rate limiter" --target copilot

# Log a decision while working
cram decide "use cursor-based pagination, not offset — offset breaks under concurrent writes"

# Log a gotcha you just found
cram gotcha "the users.email column is nullable in prod despite NOT NULL in schema.prisma"

# Extend grace period if you commit mid-task (prevents context reset)
cram continue

# Check context freshness
cram status
```

After every commit the git post-commit hook runs `cram sync` automatically to refresh
`ARCHITECTURE.md` and `SYMBOLS.md`. A session grace period prevents sync from firing while
you're mid-task.

---

## CLI reference

| Command | What it does |
|---|---|
| `cram init [path] [--team]` | One-time setup — scans repo, generates context files, installs git hook |
| `cram mcp [--repo PATH]` | Start MCP server (stdio). Wire into your tool's settings once; clients launch it automatically. |
| `cram task "..." [--target T]` | Run context pipeline, write CURRENT_TASK.md, optionally inject into tool's auto-loaded file |
| `cram sync [path]` | Refresh ARCHITECTURE.md + SYMBOLS.md from current repo state |
| `cram decide "..." [path]` | Append a dated architectural decision to DECISIONS.md |
| `cram gotcha "..." [path]` | Append a non-obvious trap to GOTCHAS.md |
| `cram continue [path]` | Extend grace period — keep context across a mid-task commit |
| `cram status [path]` | Show each context file with age, line count, staleness warning |
| `cram benchmark [path]` | Show token and cost comparison across delivery strategies |
| `cram doctor [path]` | Health check — models, hooks, git, context files |
| `cram hook install\|uninstall` | Manage the git post-commit hook manually |
| `cram menu [path]` | Launch macOS menu bar app |
| `cram autostart on\|off` | Start menu bar app at login (macOS) |

---

## Model providers

cram uses a cheap model for its maintenance calls (generating ARCHITECTURE.md, selecting files,
extracting excerpts). Set `AICONTEXT_MODEL` to any provider:

```bash
# Inside Claude Code — zero config, uses session credentials
cram init

# Anthropic API key
export ANTHROPIC_API_KEY=sk-...
export AICONTEXT_MODEL=anthropic/claude-haiku-4-5

# Google Gemini
export GEMINI_API_KEY=...
export AICONTEXT_MODEL=gemini/gemini-2.0-flash

# OpenAI
export OPENAI_API_KEY=sk-...
export AICONTEXT_MODEL=openai/gpt-4o-mini

# Ollama (local, free, no key needed)
export AICONTEXT_MODEL=ollama/mistral
cram init
```

Also supports: AWS Bedrock, GCP Vertex AI, Azure OpenAI, custom LiteLLM proxies (install
`cram-ai[multi-provider]`).

---

## Environment variables

| Variable | Default | Description |
|---|---|---|
| `AICONTEXT_MODEL` | auto-detected | Model for context tasks — bare alias (`haiku`) or `provider/model` |
| `ANTHROPIC_API_KEY` | — | Optional inside Claude Code (uses session credentials) |
| `AICONTEXT_MAX_FILES` | `5` | Max files included in CURRENT_TASK.md per task |
| `AICONTEXT_MAX_LINES` | `300` | Max lines per file when extracting excerpts |
| `AICONTEXT_TASKS_PER_SESSION` | `4` | Assumed tasks per cache window (used by `cram benchmark`) |
| `CRAM_TASK_GRACE_SECONDS` | `600` | Seconds after `cram task` before a commit resets context |

---

## 💰 Real-world token consumption

Without context pre-loading, an agent spends the first few exchanges of every session
re-discovering the codebase — reading files, running searches, building orientation from scratch.

**What a typical session consumes (no cram):**

| Phase | What happens | Tokens |
|---|---|---|
| Session start | System prompt + tool definitions + rules files | 3–8K |
| Orientation | `find` / `grep` / `read` calls to discover relevant files cold | 20–60K |
| Active work | Conversation, edits, test runs | 20–50K |
| Output | Code written, explanations | 5–15K |
| **Per task total** | | **50–130K** |

The orientation phase (30–50% of every session) is pure re-discovery overhead — the agent reads
10–20 files cold before it knows where to work. cram replaces that with one `get_context()` call
returning ~1–2K tokens of targeted excerpts.

**Scaled to a full day:**

| Usage | Tasks/day | Est. tokens/day | Cost at Sonnet 4.6 |
|---|---|---|---|
| Light | 2 short tasks | ~150K | ~$0.50 |
| **Average** | **4 feature tasks** | **~400K** | **~$1.20** |
| Heavy | 6+ complex tasks | ~900K | ~$2.70 |

For the average developer (~400K tokens/day), roughly **120–200K tokens/day is orientation
overhead** — paid on every session, for every task, from scratch.

**cram tray shows live daily estimates** based on your actual repo size (4 sessions × 4 tasks/day).
Use the model selector in the tray popup to see estimates for your model:

| Model | Base input price |
|---|---|
| Haiku 4.5 | $1.00 / MTok |
| Sonnet 4.6 | $3.00 / MTok |
| Opus 4 | $5.00 / MTok |

| Metric | What it shows |
|---|---|
| Context reduction | How much smaller cram context is vs full repo scan |
| Without cram/day | Estimated daily cost if the agent reads the full repo each task |
| With cram/day | Estimated daily cost using the frozen context layer (MCP path) |
| Saved/day | The difference — scales with repo size and session frequency |

Run `cram benchmark` for a full breakdown across all three delivery strategies and all model tiers.

---

<details>
<summary>💸 <strong>Claude Code users: cache-write bonus</strong></summary>

This section is specific to Claude Code + Anthropic. The context layer is useful for any tool,
but Claude's prompt caching gives MCP delivery an additional cost advantage.

Anthropic's prompt cache has a 5-minute TTL. Content in the conversation **prefix** gets
cache-written at 1.25× the base input price on every new session and every TTL expiry.
Content that doesn't touch the prefix — like MCP tool results — isn't.

**Prefix injection vs MCP:**

| | Prefix injection (`--target claude`) | MCP (`get_context()`) |
|---|---|---|
| Where context lands | CLAUDE.md → front of prefix | Conversation tail (tool result) |
| Cache writes per session | N × task context tokens | 1 × tool definitions (~1–2K tokens) |
| Per-task context cost | 1.25× write per task change | 0.1× read after first session write |
| 10K-token context, 4 tasks | ~$0.09–0.15 in cache writes | ~$0.01 in cache writes |

The larger your context and the more tasks per session, the more the MCP path saves.

Run `cram benchmark` to model the exact numbers for your repo.

**The floor check:** the frozen prefix must exceed 2,048 tokens (Sonnet 4.6) or 4,096 tokens
(Opus 4.8 / Haiku 4.5) to cache at all. `cram benchmark` flags this if your context files are
below the threshold.

</details>

---

## Running tests

```bash
pip install pytest
pytest
```

No API key required — all model calls are mocked.

---

## License

MIT
