Metadata-Version: 2.4
Name: ctxgraph
Version: 0.1.0
Summary: Context graph engine for AI coding assistants — build knowledge graphs, generate context capsules
Author: ctxgraph contributors
License: MIT
Keywords: code-graph,knowledge-graph,claude,code-analysis,ai-context
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Software Development :: Code Generators
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: typer>=0.9
Requires-Dist: rich>=13.0
Provides-Extra: mcp
Requires-Dist: mcp>=1.0.0; extra == "mcp"
Requires-Dist: anyio>=4.0; extra == "mcp"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"

# ctxgraph

**Context graph engine for AI coding assistants.** Builds a multi-layer knowledge graph from your Python codebase and generates token-efficient context capsules for Claude, OpenAI, Ollama, and other AI tools.

```bash
pip install ctxgraph

# Build knowledge graph
ctx build

# Generate context for a task (92-99% fewer tokens than raw code)
ctx capsule "fix JWT expiry in auth module"

# Launch Claude with context pre-loaded
ccg "fix the login redirect bug"

# Visualize your codebase
ctx view

# Search the graph
ctx query "auth jwt validate"
```

---

## How It Works

ctxgraph analyzes your Python codebase using static AST analysis to build a **multi-layer knowledge graph** in SQLite:

```
Repository (.py files)
    │
    ▼
┌──────────────────────────────────────────────┐
│               ctx build                        │
│                                               │
│  1. importer.py (AST)                         │
│     └── Extract imports → file-to-file edges  │
│                                               │
│  2. symbols.py (AST)                          │
│     └── Extract classes, functions, methods   │
│         calls, inheritance → symbol nodes     │
│                                               │
│  3. semantic.py (docstrings)                  │
│     └── Extract summaries → node enrichment   │
│                                               │
│  Store: SQLite (nodes + edges tables)         │
└──────────────────┬───────────────────────────┘
                   │
                   ▼
┌──────────────────────────────────────────────┐
│          Context Capsule Generation           │
│                                               │
│  1. Tokenize query → keyword search           │
│  2. Score: name matches (2x), text (0.5x)    │
│  3. BFS neighborhood expansion (depth=2)      │
│  4. Render token-efficient DSL format         │
└──────────────────────────────────────────────┘
```

### Architecture

```
┌─────────┐    ┌──────────────┐    ┌──────────────┐
│   CLI   │───▶│  Analyzers   │───▶│   SQLite DB  │
│  typer  │    │  AST-based   │    │  .ctxgraph/  │
└────┬────┘    └──────────────┘    └──────┬───────┘
     │                                    │
     ├── ctx build ───────────────────────▶│  Graph build
     │                                     │
     ├── ctx capsule ◀─────────────────────│  Query + BFS
     │                                     │
     ├── ctx query ◀──────────────────────│  Search
     │                                     │
     ├── ctx view ◀───────────────────────│  D3.js viz
     │                                     │
     ├── ctx serve ◀──────────────────────│  MCP server
     │                                     │
     └── ccg wrapper ───▶ Claude Code ────┘  AI tool
```

---

## Token-Efficient DSL Format

ctxgraph uses a custom DSL format instead of JSON, saving **~4.7× tokens** on average:

```
JSON: 426 tokens                    DSL: 143 tokens
─────                               ────
{                                   [CTX]calculator expression parsing
  "nodes": [
    {                               [F]calc/parser.py
      "id": "file:calc/parser.py",    D:Tokenize and parse math expressions
      "type": "file",                S:tokenize, parse, Expression
      "name": "parser.py",          [F]calc/core.py
      "path": "calc/parser.py",      D:Core math operations
      "summary": "Tokenize..."      [C]Calculator
    },                                D:Main calculator class
    ...
  ],                                 [DEP]
  "edges": [...]                      parser.py → core.py
}                                     parser.py → plugins.py
```

---

## Commands

### `ctx build` — Build knowledge graph

```bash
# Current directory
ctx build

# Specific repo
ctx build /path/to/project

# Custom exclude patterns
ctx build --exclude "vendor/*" --exclude "legacy/*"
```

### `ctx capsule <query>` — Generate context

```bash
# Balanced (default: 20 nodes, depth 2)
ctx capsule "fix JWT token validation"

# Fast (10 nodes, depth 1)
ctx capsule "fix JWT token validation" --mode fast

# Deep (40 nodes, depth 3)
ctx capsule "fix JWT token validation" --mode deep

# Project architecture overview
ctx capsule --overview
```

### `ctx query <search>` — Search graph

```bash
ctx query "user auth"
ctx query "payment gateway" --mode deep
```

Returns ranked nodes with relevance scores.

### `ctx view` — Visualize graph

```bash
ctx view
ctx view --output graph.html
ctx view --port 8080 --no-open
```

Generates interactive D3.js force-directed HTML (zero JS toolchain).

### `ctx serve` — MCP server

```bash
pip install ctxgraph[mcp]
ctx serve
```

Starts an MCP protocol server. Claude Desktop config:

```json
{
  "mcpServers": {
    "ctxgraph": {
      "command": "ctx",
      "args": ["serve"]
    }
  }
}
```

Tools: `search_graph`, `get_context_capsule`, `get_file_dependencies`, `get_project_overview`.

### `ctx info` — Graph statistics

```bash
ctx info
# ┌────────────────────┬───────┐
# │ Total Nodes        │ 1090  │
# │ Total Edges        │ 1565  │
# │   files            │ 147   │
# │   classes          │ 45    │
# │   functions        │ 312   │
# └────────────────────┴───────┘
```

---

## Claude Wrapper (`ccg`)

```bash
# Single-shot
ccg "fix the JWT expiry bug in auth module"

# Interactive session with context pre-loaded
ccg --chat "refactor the payment flow"

# Project overview
ccg --overview

# With specific mode
ccg --mode deep "redesign the database schema"
```

---

## Modes

| Mode | Max Nodes | BFS Depth | Use Case |
|------|-----------|-----------|----------|
| `fast` | 10 | 1 | Quick questions, small fixes |
| `balanced` (default) | 20 | 2 | General development |
| `deep` | 40 | 3 | Complex refactoring, architecture |

---

## Configuration

`.ctxgraph/config.toml` (or `.ctxgraph/config.json`):

```toml
[graph]
exclude = ["legacy/*", "vendor/*"]

[ai]
provider = "ollama"           # ollama, claude, openai, custom
model = "qwen2.5-coder:7b"
endpoint = "http://localhost:11434"

[context]
mode = "balanced"
max_nodes = 20
max_depth = 2
```

Environment overrides:

| Variable | Overrides |
|----------|-----------|
| `CTXGRAPH_PROVIDER` | `ai.provider` |
| `CTXGRAPH_MODEL` | `ai.model` |
| `CTXGRAPH_ENDPOINT` | `ai.endpoint` |
| `ANTHROPIC_API_KEY` | Claude API key |
| `OPENAI_API_KEY` | OpenAI API key |

### Provider Switching

```bash
# Ollama (default, no API key)
ctx capsule "query"

# Claude
CTXGRAPH_PROVIDER=claude CTXGRAPH_MODEL=claude-sonnet-4-20250514 ctx capsule "query"

# OpenAI
CTXGRAPH_PROVIDER=openai CTXGRAPH_MODEL=gpt-4o ctx capsule "query"

# Custom (OpenAI-compatible API)
CTXGRAPH_PROVIDER=custom CTXGRAPH_ENDPOINT=http://my-api/v1 ctx capsule "query"
```

---

## Benchmark Results

### Methodology

All benchmarks measure **token count** (whitespace-split word count) as a reproducible proxy for LLM token usage.

#### Token Efficiency (Capsule vs Raw Files)

**Baseline ("without graph"):** All `.py` files in the project directory (excluding build artifacts like `__pycache__`, `.git`, `venv`). This represents what an AI assistant would need to read to understand the codebase without ctxgraph.

**Measurement:** For each project, we build the graph once, then run multiple queries across all three modes (fast/balanced/deep). The capsule token count is averaged across queries and compared against the raw file token count.

**Savings formula:** `(1 - capsule_tokens / raw_tokens) × 100`

| Project | Files | Raw Tokens | Avg Capsule Tokens | **Avg Saved** | Build Time |
|---------|-------|-----------|-------------------|:------------:|:----------:|
| tiny_app | 7 | 1,558 | ~112 | **92.8%** | ~82ms |
| web_api | 23 | 6,567 | ~136 | **97.9%** | ~474ms |
| microsvc | 22 | 10,587 | ~63 | **99.4%** | ~916ms |
| dataflow | 35 | ~12,500 | ~78 | **~99.4%** | ~560ms |

**Overall: 97.0% average token savings** across all 4 projects and 42 benchmark runs.

#### DSL vs JSON Format Efficiency

**Methodology:** For the same set of nodes and edges, we render both a DSL capsule and an equivalent JSON structure. We compare token counts across both representations.

| Project | Query | DSL Tokens | JSON Tokens | **Ratio** |
|---------|-------|:----------:|:-----------:|:---------:|
| tiny_app | calculator | 147 | 434 | **3.0×** |
| tiny_app | parse expression | 137 | 451 | **3.3×** |
| web_api | user management | 126 | 403 | **3.2×** |
| web_api | JWT auth login | 136 | 308 | **2.3×** |
| microsvc | auth service | 32 | 219 | **6.8×** |
| microsvc | payment billing | 42 | 395 | **9.4×** |

**Overall: 4.7× fewer tokens** than equivalent JSON representation.

#### Ollama Comparison (With vs Without Graph)

**Methodology:** We compare LLM answer quality (keyword recall coverage) with and without ctxgraph context. For each query:
1. **Without graph:** Ask Ollama the question directly (no code context)
2. **With graph:** Build a context capsule from the graph, prepend it to the same question
3. **Coverage score:** % of predefined keywords (file names, concepts) that appear in the answer

| Query | Coverage (no ctx) | Coverage (with ctx) | Δ |
|-------|:-----------------:|:-------------------:|:-:|
| Calculator expression parsing (tiny_app) | 100% | 100% | — |
| Plugin registration system (tiny_app) | 33% | **100%** | **+67pp** |
| JWT authentication (web_api) | 75% | **100%** | **+25pp** |
| Middleware pipeline (web_api) | 100% | 100% | — |
| Circuit breaker (microsvc) | 75% | 75% | — |
| Services & communication (microsvc) | 50% | **100%** | **+50pp** |
| PipelineBuilder pattern (dataflow) | 100% | 75% | -25pp |
| Processor registration (dataflow) | 33% | **67%** | **+34pp** |
| Event bus & error handling (dataflow) | 100% | 100% | — |

**Results:** Average coverage improvement of **+16.7pp**. Coverage improved on **4/9** queries (44%). For project-specific questions (plugin system, services, processors), the graph provides concrete file and class names the model cannot guess from training data alone.

> **Note:** The one regression (PipelineBuilder) occurred because without context the model gave a generic answer matching all keywords, while with context it focused on the actual codebase implementation and missed the "scheduler" keyword — a more honest and useful answer for the developer.

---

## Examples

### Debug a failing test

```bash
ctx build
ctx capsule "test_user_login is failing with auth error" --mode deep
# Output →
# [F]tests/test_auth.py
# [F]src/auth/login.py
# [C]AuthService
# [DEP] auth/login.py → core/database.py, auth/session.py
```

### Understand a new codebase

```bash
ctx capsule "project architecture" --overview
ccg --chat "explain the overall architecture and data flow"
```

### Refactor across modules

```bash
ctx capsule "extract payment processing into separate module" --mode deep
```

---

## Development

```bash
git clone https://github.com/shashi3070/ctxgraph.git
cd ctxgraph

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run benchmarks
python benchmarks/run_benchmarks.py

# Ollama comparison (requires local Ollama)
python benchmarks/run_ollama_comparison.py
```

### Project Structure

```
src/ctxgraph/
├── cli/main.py              — Typer CLI (6 commands)
├── graph/
│   ├── models.py            — Node, Edge, Graph dataclasses
│   ├── storage.py           — SQLite persistence
│   ├── builder.py           — Graph build orchestrator
│   └── query.py             — Tokenizer + BFS + relevance scoring
├── capsule/renderer.py      — DSL context generation
├── analyzers/python/
│   ├── importer.py          — AST import extraction
│   ├── symbols.py           — AST class/function/method analysis
│   └── semantic.py          — Docstring summarization
├── config/
│   ├── settings.py          — TOML/JSON/env config loading
│   └── providers.py         — Ollama, Claude, OpenAI API clients
├── clients/models.py        — Model mode enum (fast/balanced/deep)
├── exclude/patterns.py      — Exclusion pattern matching
├── view/visualizer.py       — D3.js HTML graph generator
├── wrapper/claude.py        — ccg Claude wrapper
└── mcp/server.py            — MCP protocol server
```

---

## Known Limitations

- **Python-only analysis** — other languages get file-level nodes only
- **Keyword-based search** — no semantic/embedding matching (planned)
- **No incremental rebuild** — full rebuild on every `ctx build` (planned)
- **MCP server** — stdio mode only, SSE not yet supported

---

## License

MIT
