Metadata-Version: 2.4
Name: cram-ai
Version: 0.1.0
Summary: Cut AI coding token costs by 96-98% — curated context, MCP server, and tray app for AI coding tools
License-Expression: MIT
Project-URL: Homepage, https://github.com/vishbay/cram-ai
Project-URL: Repository, https://github.com/vishbay/cram-ai
Project-URL: Bug Tracker, https://github.com/vishbay/cram-ai/issues
Keywords: ai,llm,context,token,coding,claude,cursor,copilot
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: anthropic>=0.40.0
Requires-Dist: tomli>=2.0; python_version < "3.11"
Provides-Extra: multi-provider
Requires-Dist: litellm>=1.40.0; extra == "multi-provider"
Provides-Extra: mac
Requires-Dist: rumps>=0.4.0; extra == "mac"
Provides-Extra: tray
Requires-Dist: pystray>=0.19.0; extra == "tray"
Requires-Dist: pillow>=10.0; extra == "tray"
Requires-Dist: pywebview>=5.0; extra == "tray"
Requires-Dist: flask>=3.0; extra == "tray"
Provides-Extra: mcp
Requires-Dist: mcp>=1.0.0; extra == "mcp"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"

# cram-ai

> Slash AI coding token costs by injecting only what the model needs — nothing more.

AI coding tools auto-index your entire repo at session start. That indexing generates **cache writes** — the most expensive token type (3–4× the cost of reads). cram-ai replaces auto-indexing with a small set of curated files that give the model exactly what it needs: repo structure, key decisions, and focused excerpts of only the files relevant to your current task.

---

## Benchmarks

### cram-ai itself (31 source files, Python CLI tool)

| | Tokens | Sonnet cost/session | Opus cost/session |
|---|---|---|---|
| **Without cram** — full repo auto-indexed | 49,683 | $0.186 | $0.932 |
| **Without cram** — orientation set only¹ | 19,687 | $0.074 | $0.369 |
| **With cram** — ARCHITECTURE + SYMBOLS + task context | 1,898 | $0.007 | $0.036 |

**96% token reduction. $0.18 saved per session. $18 over 100 sessions (Sonnet).**

---

### pallets/flask (118 source files, Python web framework)

| | Tokens | Sonnet cost/session | Opus cost/session |
|---|---|---|---|
| **Without cram** — full repo auto-indexed | 171,641 | $0.644 | $3.22 |
| **Without cram** — orientation set only¹ | 60,863 | $0.228 | $1.14 |
| **With cram** — ARCHITECTURE + SYMBOLS + task context | 5,929 | $0.022 | $0.111 |

**96.5% token reduction. $0.62 saved per session. $62 over 100 sessions (Sonnet).**

---

### hoppscotch/hoppscotch (2,151 source files, TypeScript monorepo)

| | Tokens | Sonnet cost/session |
|---|---|---|
| **Without cram** — full repo auto-indexed | 418,697 | $1.57 |
| **With cram** | 7,239 | $0.027 |

**98.3% reduction. $154 saved over 100 sessions (Sonnet).**

---

> ¹ *Orientation set = file tree + README + pyproject.toml/package.json + 5 largest source files. A realistic estimate for tools that don't index everything.*  
> Pricing: Claude Sonnet 4.6 cache write $3.75/M, Opus 4.8 $18.75/M. Savings scale with team size and session frequency.

---

## How it works

AI agents spend most tokens on **orientation** — finding relevant files, understanding structure, reading configs. cram-ai replaces that with a curated map the model reads instead of building itself.

```
your-repo/
└── .cram-ai-context/
    ├── ARCHITECTURE.md   ← repo structure, tech stack, key files (auto-generated by Haiku)
    ├── DECISIONS.md      ← architectural decisions you want the AI to respect
    ├── SYMBOLS.md        ← public function/class index across all source files (auto-generated)
    └── CURRENT_TASK.md   ← per-session: task + focused excerpts of relevant files
```

**`SYMBOLS.md`** is the key accuracy improvement. Rather than asking a model to guess which files matter based on filenames alone, cram maps every source file to its public identifiers (`api/routes.py: handle_rate_limit, check_throttle, apply_backoff`). The model uses that map to select files *and* identify the exact functions to excerpt — so "fix the rate limiter" finds `check_throttle` even if the words don't match.

`cram task "..."` runs before every session:

1. **`[1/4]`** Loads `SYMBOLS.md` — 455 identifiers across 65 files, zero LLM calls
2. **`[2/4]`** Sends architecture + symbol index to Haiku → model returns `path | RelevantFunc, OtherClass`
3. **`[3/4]`** Extracts identifier-focused excerpts — only the lines that contain those functions, plus context window
4. **`[4/4]`** Writes to your tool's instruction file, warns if below cache minimum for your model

All stages stream live to the popup so you see exactly what's happening.

---

## Quick start

```bash
pip install cram-ai

cd your-repo
cram init                              # one-time setup — scans repo, generates docs, indexes symbols
cram task "add login validation"       # run before every session
# → context pre-loaded into your AI tool
cram sync                              # run after every commit (or fires automatically via git hook)
```

**First command to context ready: under 60 seconds.**

---

## CLI commands

| Command | When to run | What it does |
|---|---|---|
| `cram init` | Once per repo | Scans structure, generates `ARCHITECTURE.md` + `SYMBOLS.md` via Haiku |
| `cram task "..."` | Before every session | Identifies relevant files by symbol, inlines focused excerpts |
| `cram continue` | Mid-session before committing | Extends grace period — prevents context reset on mid-task commits |
| `cram sync` | After every commit | Updates `ARCHITECTURE.md` + `SYMBOLS.md` from git diff |
| `cram decide "..."` | When making arch choices | Appends a dated decision entry to `DECISIONS.md` |
| `cram status` | Anytime | Shows `.cram-ai-context/` files and freshness |

---

## Provider support

The tool is model-agnostic. Set `AICONTEXT_MODEL` to any provider:

```bash
# Claude CLI (default — works inside Claude Code with no API key)
cram init

# Anthropic SDK
export ANTHROPIC_API_KEY=sk-...
export AICONTEXT_MODEL=anthropic/claude-haiku-4-5-20251001
cram init

# OpenAI
export OPENAI_API_KEY=sk-...
export AICONTEXT_MODEL=openai/gpt-4o-mini
cram init

# Google Gemini
export GEMINI_API_KEY=...
export AICONTEXT_MODEL=gemini/gemini-2.0-flash
cram init

# Local (Ollama — free, no key needed)
export AICONTEXT_MODEL=ollama/mistral
cram init
```

Also supports: AWS Bedrock, GCP Vertex AI, Azure OpenAI, custom LiteLLM proxies — auto-discovered from env/credentials.

---

## Session discipline

The context files handle orientation. These rules handle the rest:

1. **Run `cram task "..."` before every session** — never let the model hunt for files itself.
2. **Hard session boundary** — end the session the moment a feature works. New code = growing context = rising cost.
3. **Mid-task commit?** Run `cram continue` first to extend the grace period.
4. **Run `cram sync` after every commit** — keeps `ARCHITECTURE.md` and `SYMBOLS.md` accurate.
5. **Architectural decision?** Run `cram decide "use Redis for sessions"` — keeps `DECISIONS.md` current without opening the file.

---

## Environment variables

| Variable | Default | Description |
|---|---|---|
| `AICONTEXT_MODEL` | auto-detected | Model for context tasks — bare alias or `provider/model` |
| `ANTHROPIC_API_KEY` | — | Anthropic API key (optional inside Claude Code) |
| `AICONTEXT_MAX_FILES` | `5` | Max files inlined per task |
| `AICONTEXT_MAX_LINES` | `300` | Max lines per ARCHITECTURE.md |
| `AICONTEXT_MAX_EXCERPT_LINES` | `80` | Max lines excerpted per file in `CURRENT_TASK.md` |
| `CRAM_TASK_GRACE_SECONDS` | `600` | Seconds after `cram task` before a commit resets context |

---

## Works with any AI coding tool

| Tool | How context loads |
|---|---|
| **Claude Code** | Reads `.cram-ai-context/` recursively — all files auto-loaded |
| **Cursor** | Writes to `.cursor/rules/cram-task.md` — auto-loaded by Cursor |
| **Windsurf** | Writes to `.windsurf/rules/cram-task.md` — auto-loaded |
| **Codex** | Writes to `.cram-ai-context/AGENTS.md` — auto-loaded |
| **GitHub Copilot** | Writes to `.github/cram-task.md` — include once in `copilot-instructions.md` |

For non-Claude tools, cram automatically prepends a compact architecture summary so the model has repo orientation even without recursive file loading.

---

## Running tests

```bash
pip install pytest
pytest
```

57 passing tests, no API key required. All model calls are mocked.

---

## License

MIT
