Metadata-Version: 2.4
Name: ctxmap
Version: 0.1.0
Summary: Self-maintaining context map for AI coding assistants. Replaces CLAUDE.md with a file that updates itself on every git commit.
Project-URL: Homepage, https://github.com/avstrix/ctxmap
Project-URL: Repository, https://github.com/avstrix/ctxmap
Project-URL: Issues, https://github.com/avstrix/ctxmap/issues
Project-URL: Changelog, https://github.com/avstrix/ctxmap/blob/main/CHANGELOG.md
License: MIT
License-File: LICENSE
Keywords: ai-coding,claude,claude-code,codebase-context,context,copilot,cursor,knowledge-graph,mcp,token-reduction,tree-sitter
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Documentation
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: click>=8.1
Requires-Dist: fastmcp>=2.0
Requires-Dist: httpx>=0.27
Requires-Dist: networkx>=3.2
Requires-Dist: rich>=13.0
Requires-Dist: tree-sitter-go>=0.23
Requires-Dist: tree-sitter-java>=0.23
Requires-Dist: tree-sitter-javascript>=0.23
Requires-Dist: tree-sitter-python>=0.23
Requires-Dist: tree-sitter-rust>=0.23
Requires-Dist: tree-sitter-typescript>=0.23
Requires-Dist: tree-sitter>=0.23
Requires-Dist: watchdog>=4.0
Provides-Extra: all
Requires-Dist: anthropic>=0.30; extra == 'all'
Requires-Dist: faster-whisper>=1.0; extra == 'all'
Requires-Dist: graspologic>=3.3; extra == 'all'
Requires-Dist: sentence-transformers>=3.0; extra == 'all'
Requires-Dist: yt-dlp>=2024.1; extra == 'all'
Provides-Extra: communities
Requires-Dist: graspologic>=3.3; extra == 'communities'
Provides-Extra: dev
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Provides-Extra: embeddings
Requires-Dist: sentence-transformers>=3.0; extra == 'embeddings'
Provides-Extra: semantic
Requires-Dist: anthropic>=0.30; extra == 'semantic'
Provides-Extra: video
Requires-Dist: faster-whisper>=1.0; extra == 'video'
Requires-Dist: yt-dlp>=2024.1; extra == 'video'
Description-Content-Type: text/markdown

# ctxmap

**A self-maintaining context map for AI coding assistants.**

[![PyPI](https://img.shields.io/pypi/v/ctxmap?style=flat-square)](https://pypi.org/project/ctxmap/)
[![CI](https://github.com/avstrix/ctxmap/actions/workflows/ci.yml/badge.svg)](https://github.com/avstrix/ctxmap/actions/workflows/ci.yml)
[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue?style=flat-square)](https://www.python.org/)
[![MIT License](https://img.shields.io/badge/license-MIT-yellow?style=flat-square)](LICENSE)

ctxmap solves one problem: **AI coding assistants re-read your entire codebase from scratch every session.**

It generates a single `CONTEXT.md` — a compressed, always-current map of your repo — that any AI reads at session start instead of scanning files. Updated automatically on every git commit, rewriting only the sections that actually changed.

```bash
pip install ctxmap
ctxmap install    # adds CONTEXT.md reference to CLAUDE.md, sets git hook
ctxmap build      # first run — generates CONTEXT.md (~500 tokens)
# from now on: auto-updates on every git commit
```

---

## The problem

Every time you start a Claude Code session, Claude re-reads the same files to understand your project. Your API layer hasn't changed in months. Your Redux pattern hasn't changed. Your test setup hasn't changed. But Claude re-scans everything anyway — spending thousands of tokens on context that was stable yesterday and will be stable tomorrow.

`/init` makes this worse: it regenerates your entire CLAUDE.md from scratch every run, overwrites your manual notes, and produces a file that's either too long or missing what actually matters.

---

## How it works

```
git commit
  → diff file hashes          (instant, no LLM)
  → identify stale sections   (instant)
  → rewrite only those        (~300–600 tokens)
  → Claude reads CONTEXT.md next session (~500 tokens, fixed)
```

`CONTEXT.md` has six sections, each independently tracked:

```markdown
## Project        ← written once, updated only when README changes
## Architecture   ← updated when file/module structure changes
## Conventions    ← updated when code patterns change
## Hot files      ← updated each commit (most-connected files)
## Recent         ← rolling 7-day git log, auto-rotated
## Notes          ← your manual section, NEVER touched by ctxmap
```

Each section has a hash of its source inputs stored in `.ctxmap/section_hashes.json`. On every `ctxmap update`, only sections whose inputs changed get rewritten. A typical commit touching 3 files: `recent` updates in ~200 tokens, everything else stays cached.

---

## What CONTEXT.md looks like

```markdown
# myapp — context map
*Generated by ctxmap. Last updated: 2026-04-21.*

## Project
**Repo:** `myapp`
**Language:** TypeScript/React
**Scale:** 47 files · 612 symbols · 2,341 relationships
**About:** Brand management dashboard for enterprise clients.

## Architecture
**Module map:**
- `PrivateAPI.js` (12 fns) → auth, config
- `store/index.js` (8 fns) → reducers, middleware
- `components/Dashboard.jsx` (6 fns) → api, hooks

**Key classes:**
- `AuthManager` in `auth/manager.js` L14
- `APIClient` in `PrivateAPI.js` L8

## Conventions
- Private functions use `_prefix` convention
- Tests use `test_*` naming (34 test functions detected)
- Test files: `src/__tests__/unit/`, `src/__tests__/integration/`
- NOTE: MemoPromise used for API deduplication
- WHY: isSuccess() checks response.metadata.status not HTTP status

## Hot files
- `PrivateAPI.js` — 41 connections
- `store/reducers.js` — 28 connections
- `auth/manager.js` — 19 connections

## Recent
**2026-04-21** — fix: resolve auth token refresh race condition
  - `PrivateAPI.js`
  - `auth/tokenManager.js`

## Notes
<!-- This section is yours. ctxmap never touches it. -->
- Use renderWithProviders from testUtils.jsx for all component tests
- MUI v4 for old components, v5 for new — don't mix in same file
- allowAuthHeaders is dead code in PrivateAPI.js, ignore it
```

---

## How it works

![ctxmap workflow](docs/images/ctxmap_how_it_works.svg)

---

## Real-world benchmark

Tested on [gstin-health](https://github.com/avstrix/gstin-health), a Next.js/TypeScript project:

| | Tokens |
|---|---|
| Raw codebase (all .ts/.tsx/.json/.md files) | 57,003 |
| CONTEXT.md | 640 |
| Reduction | **98.9% · 89x smaller** |

> Token counts measured with tiktoken (cl100k_base). Real savings depend on how much of your codebase a given session would otherwise scan — smaller repos see smaller absolute savings, larger repos see larger.

More importantly: answer quality. Same question, two sessions:

**Without CONTEXT.md** — generic Next.js boilerplate guess: `package.json`, `app/page.tsx`, `app/layout.tsx`.

**With CONTEXT.md** — specific, accurate, prioritised: `lib/gstApi.ts` (31 connections, core of everything), `lib/scoreEngine.ts` (business logic), `app/api/gstin/[gstin]/route.ts` (API layer) — in the right order, with the right reasons.

---

## Token economics

| Operation | Cost | Frequency |
|---|---|---|
| First build | ~5,000 tokens | Once |
| Typical commit (3 files changed) | ~300–600 tokens | Per commit |
| Session start (Claude reads CONTEXT.md) | ~500 tokens | Per session |
| Without ctxmap (Claude scans project) | ~5,000–57,000 tokens | Per session |

---

## Why CONTEXT.md beats CLAUDE.md

| | CLAUDE.md + `/init` | ctxmap |
|---|---|---|
| Update model | Full rebuild every time | Patch only changed sections |
| Preserves manual notes | No — overwritten on `/init` | Yes — `## Notes` is locked |
| Per-session cost | Variable, unbounded | ~500 tokens, fixed |
| Per-update cost | Full LLM rebuild | ~300–600 tokens |
| Works with any AI | Yes | Yes — just a markdown file |
| Understands your codebase | Generic output | Detects your actual patterns |

---

## CLI reference

```bash
ctxmap build              # full build: structural graph + CONTEXT.md
ctxmap update             # incremental: re-parse changed files, patch sections
ctxmap context            # regenerate CONTEXT.md only
ctxmap context --force    # rewrite all sections unconditionally
ctxmap status             # graph stats + token count
ctxmap install            # configure CLAUDE.md + git hook
ctxmap serve              # start MCP server for deep queries

# Deep graph queries (optional)
ctxmap query "auth flow"
ctxmap explain UserService
ctxmap path LoginHandler Response
ctxmap semantic           # LLM extraction on docs/images (needs ANTHROPIC_API_KEY)
ctxmap watch              # auto-update on file changes
```

---

## Platform support

`ctxmap install` auto-detects your AI tool and writes the correct config:

| Platform | What gets written |
|---|---|
| Claude Code | `.mcp.json` + `CLAUDE.md` section + `~/.claude/skills/ctxmap/` |
| Cursor | `.cursor/rules/ctxmap.mdc` (alwaysApply) |
| Codex | `AGENTS.md` section + skill file |
| Aider / OpenCode / Windsurf | `AGENTS.md` section |

Or target a specific platform:
```bash
ctxmap install --platform cursor
ctxmap install --platform claude-code
```

---

## Optional MCP server

For deep queries during a session, ctxmap also runs as an MCP server:

```bash
ctxmap serve   # stdio transport — add to .mcp.json
```

Available tools: `get_blast_radius`, `query_graph`, `get_node`, `get_path`,
`get_god_nodes`, `get_surprising_connections`, `get_architecture_overview`,
`run_semantic`, `export_graph`.

MCP is optional depth — CONTEXT.md handles the common case without any tooling.

---

## Optional dependencies

```bash
pip install ctxmap[semantic]     # LLM extraction for docs/images (Anthropic)
pip install ctxmap[video]        # video/audio transcription (Whisper + yt-dlp)
pip install ctxmap[communities]  # Leiden community detection
pip install ctxmap[embeddings]   # vector similarity search
pip install ctxmap[all]          # everything
```

---

## Privacy

- Code files: processed locally via Tree-sitter — nothing leaves your machine
- Docs/images: sent to Anthropic API only when you explicitly run `ctxmap semantic`
- Video/audio: transcribed locally with faster-whisper — never leaves your machine
- No telemetry, no analytics, no tracking of any kind

---

## Architecture

```
ctxmap/
├── context.py     CONTEXT.md generator — section-level diff updates  ← core
├── parser.py      Tree-sitter AST extraction (structural, no LLM)
├── store.py       SQLite + NetworkX unified store
├── builder.py     SHA-256 incremental build/update/watch
├── semantic.py    LLM extraction with per-file hash cache
├── analysis.py    God nodes, blast radius, surprising connections
├── server.py      FastMCP server + MCP tools
├── installer.py   Platform auto-detection + git hook installation
└── cli.py         CLI entry point
```

See [ARCHITECTURE.md](ARCHITECTURE.md) for full details.

---

## Contributing

```bash
git clone https://github.com/avstrix/ctxmap
cd ctxmap
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest tests/ -q
```

To improve a section: each `_build_*_section()` in `context.py` is independent and easy to extend.

Worked examples are the most useful contribution — run ctxmap on a real project, save `CONTEXT.md` output, write notes on what was useful vs missing, submit a PR.

MIT License.
