Metadata-Version: 2.4
Name: claude-context-compiler
Version: 0.1.0
Summary: Local context compiler for AI coding assistants — smallest correct context bundle with rationale
Project-URL: Repository, https://github.com/punakkals/context-compiler
Author: Punakkals
License: Apache-2.0
Keywords: claude,code-intelligence,context,llm,mcp,tree-sitter
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Requires-Dist: click>=8.0
Requires-Dist: fastmcp>=2.0
Requires-Dist: kuzu>=0.7
Requires-Dist: pydantic>=2.0
Requires-Dist: rank-bm25>=0.2.2
Requires-Dist: rapidfuzz>=3.0
Requires-Dist: tree-sitter-python>=0.23
Requires-Dist: tree-sitter-typescript>=0.23
Requires-Dist: tree-sitter>=0.23
Provides-Extra: dev
Requires-Dist: datamodel-code-generator>=0.25; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Provides-Extra: semantic
Requires-Dist: fastembed>=0.4; extra == 'semantic'
Description-Content-Type: text/markdown

# context-compiler

A local-first MCP server that indexes your Python and TypeScript codebase into a dependency graph and returns the **smallest correct context bundle** for any coding task — with a one-line rationale for every included file.

No cloud. No LLM API calls. No data leaves your machine.

---

## The problem

When you ask Claude to fix a bug or add a feature, it reads files by guessing which ones are relevant. It over-reads (wastes tokens) or misses the file that actually matters. The bigger the codebase, the worse this gets.

## How it works

```
Your task: "fix the keycloak token expiry"
         ↓
  Classify → BUG_FIX
         ↓
  Find entry nodes → keycloak.py (BM25 + docstring matching)
         ↓
  Traverse graph → keycloak.py + secured_view.py + test_keycloak_steps.py
         ↓
  Score + budget → 870 tokens (within 8000 limit)
         ↓
  Return bundle with rationale per file
```

Everything — classification, traversal, scoring, rationale — is deterministic. Same repo + same task = same bundle, every time.

---

## Installation

```bash
# Index a repository
uvx context-compiler index --repo ./my-project

# Preview what context a task would produce
uvx context-compiler explain --repo ./my-project --task "fix the retry logic"

# Start the MCP server (Claude Code does this automatically)
uvx context-compiler serve --repo ./my-project
```

Requires Python 3.11+.

### Optional: semantic fallback

Install the optional fastembed model (23MB ONNX, no PyTorch) for better matching when task terms don't appear in symbol names:

```bash
pip install "context-compiler[semantic]"
```

---

## Claude Code integration

**1. Register the MCP server:**

```bash
claude mcp add --scope user context-compiler uvx -- context-compiler serve --repo /path/to/your/repo
```

**2. Add to your repo's `CLAUDE.md`:**

```markdown
## Context retrieval

Before reading any source files, call `get_context` with the task description.
Read only the files it returns.
```

**3. Use it:**

```
> Fix the keycloak token expiry bug
```

Claude calls `get_context("fix the keycloak token expiry bug")`, gets back the exact files to read, and starts working — no guessing.

---

## MCP tools

### `get_context(task, budget=8000)`

Returns the minimal file bundle for a coding task.

```json
{
  "files": ["admin/keycloak.py", "admin/views/secured_view.py"],
  "rationale": [
    "Included Keycloak as primary task location (matched 'keycloak')",
    "Included SecuredView._has_role because it is called by Keycloak (depth 1)"
  ],
  "token_estimate": 870,
  "tokens_saved": 0,
  "task_type": "BUG_FIX",
  "confidence": 1.0
}
```

### `refresh(changed_files)`

Re-indexes the repository after file changes.

---

## What makes it different

**Task-type-aware traversal.** A bug fix traverses inbound callers and test coverage at depth 2. A new feature traverses imports and sibling modules. A refactor traverses everything at depth 3. No other tool adjusts retrieval strategy based on what you're actually trying to do.

**Rationale per file.** Every included file has a one-line explanation of why it's there. You can see what Claude will read before it reads it.

**Hard token budget.** The bundle never exceeds the limit. Partial file inclusion is not permitted.

**Local-first.** Embedded KuzuDB graph, no server, no port, no auth. Works offline.

---

## Supported languages

| Language | Parsing | Docstrings |
|---|---|---|
| Python | tree-sitter-python | ✓ (first line of docstring) |
| TypeScript / TSX | tree-sitter-typescript | ✓ (JSDoc `/** */`) |

---

## Environment variables

| Variable | Default | Description |
|---|---|---|
| `CC_REPO_PATH` | required | Path to indexed repository |
| `CC_TOKEN_BUDGET` | `8000` | Default token budget for `get_context` |

---

## Tech stack

[tree-sitter](https://tree-sitter.github.io/) · [KuzuDB](https://kuzudb.com/) · [BM25 (rank-bm25)](https://github.com/dorianbrown/rank_bm25) · [rapidfuzz](https://github.com/maxbachmann/RapidFuzz) · [FastMCP](https://github.com/jlowin/fastmcp) · [fastembed](https://github.com/qdrant/fastembed) (optional)

---

## License

Apache 2.0
