Metadata-Version: 2.4
Name: code-atlas-mcp
Version: 0.3.1
Summary: A code intelligence graph that gives AI coding agents deep, token-efficient understanding of your codebase.
Project-URL: Homepage, https://github.com/SerPeter/code-atlas
Project-URL: Documentation, https://github.com/SerPeter/code-atlas/tree/main/docs
Project-URL: Repository, https://github.com/SerPeter/code-atlas
Project-URL: Issues, https://github.com/SerPeter/code-atlas/issues
Project-URL: Changelog, https://github.com/SerPeter/code-atlas/blob/main/CHANGELOG.md
Author: SerPeter
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: ai-agents,ast,code-graph,code-intelligence,code-navigation,code-search,graph-database,mcp,memgraph,rag,semantic-search,tree-sitter
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Typing :: Typed
Requires-Python: >=3.14
Requires-Dist: httpx~=0.28
Requires-Dist: litellm~=1.81
Requires-Dist: loguru~=0.7
Requires-Dist: mcp~=1.26
Requires-Dist: neo4j~=6.1
Requires-Dist: numpy~=2.4
Requires-Dist: orjson>=3.10
Requires-Dist: pathspec~=1.0
Requires-Dist: pydantic-settings~=2.12
Requires-Dist: pydantic~=2.12
Requires-Dist: python-dotenv~=1.1
Requires-Dist: redis[hiredis]~=7.1
Requires-Dist: tenacity~=9.1
Requires-Dist: tiktoken~=0.12
Requires-Dist: tree-sitter-markdown~=0.5
Requires-Dist: tree-sitter-python~=0.23
Requires-Dist: tree-sitter~=0.25
Requires-Dist: typer~=0.21
Requires-Dist: watchfiles~=1.0
Provides-Extra: all-languages
Requires-Dist: tree-sitter-c-sharp~=0.23; extra == 'all-languages'
Requires-Dist: tree-sitter-cpp~=0.23; extra == 'all-languages'
Requires-Dist: tree-sitter-c~=0.24; extra == 'all-languages'
Requires-Dist: tree-sitter-go~=0.25; extra == 'all-languages'
Requires-Dist: tree-sitter-javascript~=0.25; extra == 'all-languages'
Requires-Dist: tree-sitter-java~=0.23; extra == 'all-languages'
Requires-Dist: tree-sitter-php~=0.24; extra == 'all-languages'
Requires-Dist: tree-sitter-ruby~=0.23; extra == 'all-languages'
Requires-Dist: tree-sitter-rust~=0.24; extra == 'all-languages'
Requires-Dist: tree-sitter-typescript~=0.23; extra == 'all-languages'
Provides-Extra: cpp
Requires-Dist: tree-sitter-cpp~=0.23; extra == 'cpp'
Requires-Dist: tree-sitter-c~=0.24; extra == 'cpp'
Provides-Extra: csharp
Requires-Dist: tree-sitter-c-sharp~=0.23; extra == 'csharp'
Provides-Extra: go
Requires-Dist: tree-sitter-go~=0.25; extra == 'go'
Provides-Extra: java
Requires-Dist: tree-sitter-java~=0.23; extra == 'java'
Provides-Extra: otel
Requires-Dist: opentelemetry-api~=1.33; extra == 'otel'
Requires-Dist: opentelemetry-exporter-otlp-proto-grpc~=1.33; extra == 'otel'
Requires-Dist: opentelemetry-exporter-otlp-proto-http~=1.33; extra == 'otel'
Requires-Dist: opentelemetry-sdk~=1.33; extra == 'otel'
Provides-Extra: php
Requires-Dist: tree-sitter-php~=0.24; extra == 'php'
Provides-Extra: ruby
Requires-Dist: tree-sitter-ruby~=0.23; extra == 'ruby'
Provides-Extra: rust
Requires-Dist: tree-sitter-rust~=0.24; extra == 'rust'
Provides-Extra: typescript
Requires-Dist: tree-sitter-javascript~=0.25; extra == 'typescript'
Requires-Dist: tree-sitter-typescript~=0.23; extra == 'typescript'
Description-Content-Type: text/markdown

# Code Atlas

**A code intelligence graph that gives AI coding agents deep, token-efficient understanding of your codebase — structure, docs, and dependencies in one searchable graph.**

> Map your codebase. Search it three ways. Feed it to agents.

[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Python](https://img.shields.io/badge/python-3.14+-blue.svg)](https://www.python.org/downloads/)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![MCP](https://img.shields.io/badge/MCP-compatible-green.svg)](https://modelcontextprotocol.io/)

---

## The Problem

Every time an AI agent touches your codebase, it burns tokens just figuring out where things are. Grep for a function name. Read five files to understand the call chain. Search docs for context. Repeat — across every task, every session. On a large project, agents can spend **30–50% of their context window** on orientation before they write a single line of code.

Many tools solve one piece of this: semantic search, or graph traversal, or keyword lookup. But a developer doesn't understand a codebase through one lens — they build a **mental model** that connects structure, meaning, and names simultaneously. Agents need the same thing.

Code Atlas is that mental model, externalized as a graph.

## What Is This?

Code Atlas builds a **graph database** of your entire codebase — code structure, documentation, and dependencies — and exposes it via **MCP tools** that AI coding agents can use to understand, navigate, and reason about your code.

Three search types, one system:

- **Graph traversal** — follow relationships: who calls this function? What does this class inherit from? What services depend on this library?
- **Semantic search** — find code by meaning: "authentication middleware" finds relevant code even if it's named `verify_token_chain`
- **BM25 keyword search** — exact matches: find that specific error message, config key, or function name

All powered by [Memgraph](https://memgraph.com/) as a single backend.

## Key Features

- **Monorepo-native** — auto-detects sub-projects, tracks cross-project dependencies, scoped queries
- **Documentation as first-class** — indexes markdown docs, ADRs, and READMEs with links to the code they describe
- **AST-level incremental indexing** — only re-indexes the entities that actually changed, not entire files
- **Pattern detection** — pluggable detectors for decorator routing, event handlers, DI, test→code mappings, and more
- **Library awareness** — lightweight stubs for external dependencies, full indexing for internal libraries
- **Self-hosted** — runs locally with Docker. No data leaves your machine
- **No additional API costs** — agent-first design means all intelligence runs through your existing subscription; local embeddings via TEI, no extra API keys
- **Token-efficient** — budget-aware context assembly that prioritizes what matters most
- **Pluggable AI** — TEI for embeddings, LiteLLM for LLM calls, or bring your own
- **MCP server** — works with Claude Code, Cursor, Windsurf, or any MCP-compatible client

## How Does This Compare?

Several excellent tools exist in this space — graph-based analyzers, semantic search engines, wiki generators, and IDE-integrated indexers. Code Atlas builds on their ideas while addressing a gap: no single tool combines graph traversal, semantic search, and BM25 keyword search with documentation intelligence and MCP exposure.

For a detailed comparison covering DeepWiki, Cursor, Sourcegraph Cody, Kit, code-graph-rag, codegraph-rust, and more, see [docs/landscape.md](docs/landscape.md).

## MCP Tools

15 tools exposed via the [Model Context Protocol](https://modelcontextprotocol.io/), designed to minimize context window overhead.

| Tool                   | What it does                                                                                 | Search | Full | Latency (avg / p95) |
| ---------------------- | -------------------------------------------------------------------------------------------- | -----: | ---: | ------------------: |
| **Search**             |                                                                                              |        |      |                     |
| `hybrid_search`        | **Primary tool** — fuses graph + BM25 + vector via RRF. Auto-adjusts weights by query shape. |   ~117 | ~497 |        548 / 677 ms |
| `text_search`          | BM25 keyword search. Quoted phrases, wildcards, field-specific queries.                      |    ~90 | ~275 |          34 / 36 ms |
| `vector_search`        | Semantic similarity via embeddings. Finds code by meaning, not name.                         |    ~67 | ~297 |        102 / 125 ms |
| `get_node`             | Find entities by name. Cascade: exact (uid + name) → partial (suffix > prefix > contains).   |   ~100 | ~254 |            7 / 8 ms |
| **Navigation**         |                                                                                              |        |      |                     |
| `get_context`          | Expand a node's neighborhood: parent, siblings, callers, callees, docs.                      |    ~64 | ~273 |          34 / 36 ms |
| `cypher_query`         | Run read-only Cypher against the graph. Auto-limited, write-protected.                       |    ~59 | ~168 |            3 / 3 ms |
| **Analysis**           |                                                                                              |        |      |                     |
| `analyze_repo`         | Structure, centrality, dependencies, pattern, or quality analysis.                           |    ~41 | ~266 |          22 / 23 ms |
| `generate_diagram`     | Mermaid diagrams: packages, imports, inheritance, module detail.                             |    ~37 | ~254 |            3 / 3 ms |
| **Guidance**           |                                                                                              |        |      |                     |
| `get_usage_guide`      | Quick-start or topic-specific guidance for the agent.                                        |    ~35 | ~106 |        < 1 / < 1 ms |
| `plan_search_strategy` | Recommends which search tool + params for a question.                                        |    ~40 |  ~97 |        < 1 / < 1 ms |
| `validate_cypher`      | Catches Cypher errors before execution.                                                      |    ~58 | ~116 |            1 / 2 ms |
| `schema_info`          | Full graph schema: labels, relationships, Cypher examples.                                   |    ~75 |  ~96 |        < 1 / < 1 ms |
| **Status**             |                                                                                              |        |      |                     |
| `index_status`         | Projects, entity counts, schema version, index health.                                       |    ~72 |  ~93 |          22 / 23 ms |
| `list_projects`        | Monorepo project list with dependency relationships.                                         |    ~56 |  ~77 |          12 / 13 ms |
| `health_check`         | Infrastructure diagnostics: Memgraph, TEI, Valkey, schema.                                   |    ~55 |  ~76 |        218 / 264 ms |

Token counts measured from MCP JSON tool definitions (tiktoken cl100k_base). **Search** = name + description (~966 total); **Full** = name + description + parameter schema with field descriptions, enums, and constraints (~2,945 total). All parameters are self-documented — agents can one-shot any tool without calling `get_usage_guide` first. **Latency** measured with local TEI embeddings on the code-atlas repo (~1,400 entities), 5 iterations, warm embedding cache. See `scripts/profile_query.py`.

## Quick Start

### Prerequisites

- [Docker](https://docs.docker.com/get-docker/) and Docker Compose
- [uv](https://docs.astral.sh/uv/) (Python package manager)

### 1. Start infrastructure

Download the compose file and start Memgraph + Valkey:

```bash
curl -O https://raw.githubusercontent.com/SerPeter/code-atlas/main/docker-compose.yml
docker compose up -d
```

Optional — add local embeddings (no API keys needed):

```bash
docker compose --profile tei up -d
```

### 2. Index your project

```bash
uvx --from code-atlas-mcp atlas index /path/to/your/project
uvx --from code-atlas-mcp atlas status
```

### 3. Connect to your AI agent

**Claude Code:**

```bash
claude mcp add code-atlas -- uvx --from code-atlas-mcp atlas mcp
```

**Cursor / other MCP clients** — add to your MCP config:

```json
{
  "mcpServers": {
    "code-atlas": {
      "command": "uvx",
      "args": ["--from", "code-atlas-mcp", "atlas", "mcp"]
    }
  }
}
```

See [CLI usage guide](docs/guides/usage.md) for more commands and options.

### Development

If you want to contribute or run from source:

```bash
git clone https://github.com/SerPeter/code-atlas.git
cd code-atlas
uv sync --group dev
uv run pre-commit install
```

## Performance

| Metric                     | Value                 |
| -------------------------- | --------------------- |
| Full index (107 files)     | **55s** (local TEI)   |
| Parse-only throughput      | **600–700 files/sec** |
| `get_node` / `text_search` | 7 ms / 34 ms          |
| `vector_search`            | 102 ms                |
| Concurrent QPS             | **238** (zero errors) |

Full index includes parsing, graph upserts, and embedding via local TEI (8 concurrent workers). Parse-only is raw tree-sitter CPU time without I/O. Query latencies are averages from `scripts/profile_query.py`. Full benchmark tables: [docs/benchmarks.md](docs/benchmarks.md)

## Documentation

- [Architecture](docs/architecture.md) — system design, pipelines, deployment model
- [Landscape](docs/landscape.md) — code intelligence tools comparison and design rationale
- [Configuration](docs/guides/configuration.md) — atlas.toml, .atlasignore, environment variables
- [CLI Usage](docs/guides/usage.md) — indexing, searching, daemon mode
- [Benchmarks](docs/benchmarks.md) — parsing, query latency, concurrency
- [Repository Guidelines](docs/guides/repo-guidelines.md) — structure your code for better indexing

## Supporting Code Atlas

I built Code Atlas because my AI agents kept burning half their context just figuring out where things are in larger
codebases. Nothing combined the search types I needed in one place, so I built it and open-sourced it so you can
benefit as well.

If Code Atlas saves you time, tokens, or makes your agents noticeably better — consider [sponsoring the project](https://github.com/sponsors/SerPeter).

[![Sponsor](https://img.shields.io/badge/Sponsor-%E2%9D%A4-pink?logo=github)](https://github.com/sponsors/SerPeter)

## License

[Apache License 2.0](LICENSE)
