Metadata-Version: 2.4
Name: diffctx
Version: 1.7.1
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Software Development :: Version Control :: Git
Classifier: Topic :: Utilities
Classifier: Typing :: Typed
Requires-Dist: pathspec>=0.11,<2.0
Requires-Dist: black>=23.0.0,<27.0 ; extra == 'dev'
Requires-Dist: charset-normalizer>=3.0,<4.0 ; extra == 'dev'
Requires-Dist: coverage>=7.0,<8.0 ; extra == 'dev'
Requires-Dist: diffctx[tree-sitter] ; extra == 'dev'
Requires-Dist: hypothesis>=6.0,<7.0 ; extra == 'dev'
Requires-Dist: import-linter>=2.0,<3.0 ; extra == 'dev'
Requires-Dist: mypy>=1.0,<3.0 ; extra == 'dev'
Requires-Dist: pre-commit>=3.0,<5.0 ; extra == 'dev'
Requires-Dist: pygit2>=1.12,<2.0 ; extra == 'dev'
Requires-Dist: pytest>=7.0,<10.0 ; extra == 'dev'
Requires-Dist: pytest-cov>=3.0,<8.0 ; extra == 'dev'
Requires-Dist: pytest-timeout>=2.1,<3.0 ; extra == 'dev'
Requires-Dist: pytest-xdist>=3.0,<4.0 ; extra == 'dev'
Requires-Dist: pyyaml>=6.0.2,<8.0 ; extra == 'dev'
Requires-Dist: radon>=6.0,<7.0 ; extra == 'dev'
Requires-Dist: ruff>=0.4,<1.0 ; extra == 'dev'
Requires-Dist: tiktoken>=0.9,<1.0 ; extra == 'dev'
Requires-Dist: types-pyyaml>=6.0,<7.0 ; extra == 'dev'
Requires-Dist: numpy>=1.24,<3.0 ; extra == 'diffctx'
Requires-Dist: charset-normalizer>=3.0,<4.0 ; extra == 'full'
Requires-Dist: diffctx[tree-sitter] ; extra == 'full'
Requires-Dist: anyio>=4.5,<5.0 ; extra == 'mcp'
Requires-Dist: mcp>=1.27,<2.0 ; extra == 'mcp'
Requires-Dist: tree-sitter>=0.21,<1.0 ; extra == 'tree-sitter'
Requires-Dist: tree-sitter-c>=0.21,<1.0 ; extra == 'tree-sitter'
Requires-Dist: tree-sitter-c-sharp>=0.21,<1.0 ; extra == 'tree-sitter'
Requires-Dist: tree-sitter-cpp>=0.22,<1.0 ; extra == 'tree-sitter'
Requires-Dist: tree-sitter-elixir>=0.3,<1.0 ; extra == 'tree-sitter'
Requires-Dist: tree-sitter-go>=0.21,<1.0 ; extra == 'tree-sitter'
Requires-Dist: tree-sitter-html>=0.23,<1.0 ; extra == 'tree-sitter'
Requires-Dist: tree-sitter-java>=0.21,<1.0 ; extra == 'tree-sitter'
Requires-Dist: tree-sitter-javascript>=0.21,<1.0 ; extra == 'tree-sitter'
Requires-Dist: tree-sitter-lua>=0.5,<1.0 ; extra == 'tree-sitter'
Requires-Dist: tree-sitter-php>=0.24,<1.0 ; extra == 'tree-sitter'
Requires-Dist: tree-sitter-python>=0.21,<1.0 ; extra == 'tree-sitter'
Requires-Dist: tree-sitter-ruby>=0.21,<1.0 ; extra == 'tree-sitter'
Requires-Dist: tree-sitter-rust>=0.21,<1.0 ; extra == 'tree-sitter'
Requires-Dist: tree-sitter-scala>=0.24,<1.0 ; extra == 'tree-sitter'
Requires-Dist: tree-sitter-swift>=0.0.1,<1.0 ; extra == 'tree-sitter'
Requires-Dist: tree-sitter-typescript>=0.21,<1.0 ; extra == 'tree-sitter'
Provides-Extra: dev
Provides-Extra: diffctx
Provides-Extra: full
Provides-Extra: mcp
Provides-Extra: tree-sitter
License-File: LICENSE
Summary: Export codebase structure and contents for AI/LLM context
Keywords: ai,chatgpt,claude,code-analysis,code-context,code-review,code-to-prompt,codebase,context,context-selection,diff-context,directory-tree,export,git-diff,json,llm,llm-context,mcp,model-context-protocol,tree,yaml
Author-email: Nikolay Eremeev <nikolay.eremeev@outlook.com>
License-Expression: Apache-2.0
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Changelog, https://github.com/nikolay-e/diffctx/releases
Project-URL: Homepage, https://github.com/nikolay-e/diffctx
Project-URL: Issues, https://github.com/nikolay-e/diffctx/issues
Project-URL: Repository, https://github.com/nikolay-e/diffctx

# diffctx — smart diff context for LLM code review

[![CI](https://github.com/nikolay-e/diffctx/actions/workflows/ci.yml/badge.svg)](https://github.com/nikolay-e/diffctx/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/diffctx)](https://pypi.org/project/diffctx/)
[![License](https://img.shields.io/pypi/l/diffctx)](https://pypi.org/project/diffctx/)

**diffctx selects the minimum code an LLM needs to review a git diff.**
Instead of pasting whole files, it walks the dependency graph from the changed
lines outward and stops as soon as additional context stops paying for itself.

## Why not just use `tree` or repomix?

| | `tree` | repomix | Claude Code Review | **diffctx** |
|---|:---:|:---:|:---:|:---:|
| **Primary use case** | directory listing | full repo export | automated PR review | **diff context for code review** |
| Smart diff context | ✗ | ✗ | ✓ | ✓ |
| Works with any LLM | ✓ | ✓ | Claude only | ✓ |
| Free / local / offline | ✓ | ✓ | $15–25/review | ✓ |
| GitHub required | ✗ | ✗ | ✓ | ✗ |
| Multiple output formats | ✗ | limited | — | YAML/JSON/MD/txt |
| Python API | ✗ | ✗ | ✗ | ✓ |
| MCP server | ✗ | ✗ | ✗ | ✓ |

## Install (30 seconds)

```bash
pip install diffctx                     # canonical
pipx install diffctx                    # or: isolated, no venv needed
pip install 'diffctx[tree-sitter]'      # + AST parsing for smarter diff context
pip install 'diffctx[mcp]'              # + MCP server for AI assistants
```

```bash
diffctx . --diff HEAD~1       # smart context for last commit → paste into Claude/ChatGPT
diffctx . -f md -c            # full export → clipboard in Markdown
```

![diffctx demo: running `diffctx . --diff HEAD~1` inside a git repo and copying the relevance-ranked YAML output to the clipboard for an LLM](https://raw.githubusercontent.com/nikolay-e/diffctx/main/docs/demo.gif)

*Demo: `diffctx . --diff HEAD~1` selects only the fragments — functions,
imports, type definitions — that an LLM actually needs to review the last
commit, instead of dumping every changed file in full.*

**Standalone binary** (no Python required): download from the
[releases page](https://github.com/nikolay-e/diffctx/releases/latest).

> Diff context mode works out of the box. Adding `[tree-sitter]` enables AST-level
> parsing for more accurate context selection across 12 languages.

## Diff Context Mode

Automatically finds the minimal set of code fragments needed to understand
a change — imports, callers, type definitions, config dependencies — without
dumping entire files. Understands 50+ file types.

```yaml
name: myproject
type: diff_context
fragment_count: 5
fragments:
  - path: src/main.py
    lines: "10-25"
    kind: function
    symbol: process_data
    content: |
      def process_data(items):
          ...
```

### How it works

Builds a code graph (imports, co-changes, type refs) and propagates
relevance from changed lines outward across it. Three scoring modes are
available — pick one with `--scoring`:

| `--scoring` | What it does                                              |
|-------------|-----------------------------------------------------------|
| `ego` (default) | Bounded ego-network expansion around changed nodes — fast, predictable radius, the current default |
| `ppr`       | Personalized PageRank with damping `--alpha` — global, smoother decay, slower |
| `bm25`      | Lexical fragment retrieval against the diff hunks — useful as a baseline / fallback when the graph is sparse |

Selection stops when relevance drops below `--tau` (the minimum score a
fragment must beat to be kept), or once `--budget` tokens have been
emitted, whichever comes first.

| Flag        | Default | Description                                                              |
|-------------|---------|--------------------------------------------------------------------------|
| `--scoring` | `ego`   | Scoring mode: `ego`, `ppr`, or `bm25`                                    |
| `--budget`  | auto    | Token cap. `auto` lets selection converge; `-1` disables the cap; `N` enforces a fixed cap |
| `--alpha`   | 0.60    | How tightly context clusters around changes (PPR damping; 0–1, higher = more focused) |
| `--tau`     | 0.08    | Minimum relevance required to include a fragment (lower = more context)  |
| `--full`    | false   | Include every changed fragment; skip the smart-selection step entirely   |

Calibration of `--alpha`, `--tau`, and the edge-weight priors is documented
in [`docs/parameter-strategy.md`](docs/parameter-strategy.md).

*Theory: [Context-Selection for Git Diff (Zenodo, 2026)](https://doi.org/10.5281/zenodo.18824580).*

### `graph` subcommand

For exploring the underlying dependency graph directly (without a diff),
use the `graph` subcommand:

```bash
diffctx graph .                                  # Mermaid graph of directory deps (default)
diffctx graph . --summary                        # cycles, hotspots, coupling metrics
diffctx graph . --level fragment -f json         # fragment-level graph as JSON
diffctx graph . --level file -f graphml -o g.xml # file-level graph as GraphML
```

| Flag        | Default      | Description                                              |
|-------------|--------------|----------------------------------------------------------|
| `-f/--format` | `mermaid`  | Output format: `mermaid`, `json`, or `graphml`           |
| `--level`   | `directory`  | Granularity: `fragment`, `file`, or `directory`          |
| `--summary` | false        | Print graph statistics (cycles, hotspots, coupling)      |

## Usage

<!-- BEGIN USAGE -->
```bash
# full codebase export:
diffctx .                                # YAML to stdout + token count
diffctx . -f md -c                       # Markdown → clipboard
diffctx . -f json -o tree.json           # JSON → file
diffctx . --no-content                   # structure only, no file contents
diffctx . --max-depth 3                  # limit depth
diffctx . -i custom.ignore               # custom ignore patterns

# diff context mode (requires git repo):
diffctx . --diff HEAD~1                  # context for last commit
diffctx . --diff main..feature           # context for feature branch
diffctx . --diff HEAD~1 --budget 30000   # limit to ~30k tokens
diffctx . --diff HEAD~1 -c               # diff context to clipboard
```
<!-- END USAGE -->

Full codebase export output format:

```yaml
name: myproject
type: directory
children:
  - name: main.py
    type: file
    content: |
      def hello():
          print("Hello, World!")
  - name: utils/
    type: directory
    children:
      - name: helpers.py
        type: file
        content: |
          def add(a, b):
              return a + b
```

## Token Counting

Token count and size are always displayed on stderr:

```text
12,847 tokens (o200k_base), 52.3 KB
```

For large outputs (>1MB), approximate counts with `~` prefix:

```text
~125,000 tokens (o200k_base), 5.2 MB
```

Uses tiktoken with `o200k_base` encoding (GPT-4o tokenizer).

## Clipboard Support

Copy output directly to clipboard with `-c` or `--copy`:

```bash
diffctx . -c                       # copy (stdout suppressed, stderr: token count)
diffctx . -c -o tree.yaml          # copy + save to file
```

**System Requirements:**

- **macOS:** `pbcopy` (pre-installed)
- **Windows:** `clip` (pre-installed)
- **Linux (Wayland):** `wl-copy`
- **Linux (X11):** `xclip` or `xsel`

## Python API

```python
from diffctx import map_directory
from diffctx import to_yaml, to_json, to_text, to_markdown

tree = map_directory(
    path,                     # directory path
    max_depth=None,           # limit traversal depth
    no_content=False,         # exclude file contents
    max_file_bytes=None,      # skip large files
    ignore_file=None,         # custom ignore file
    no_default_ignores=False, # disable default ignores
    whitelist_file=None,      # include-only filter
)

yaml_str = to_yaml(tree)
json_str = to_json(tree)
text_str = to_text(tree)
md_str = to_markdown(tree)

# Diff context mode
from pathlib import Path
from diffctx import build_diff_context, to_yaml

ctx = build_diff_context(
    Path("."),                # repository root
    "HEAD~1..HEAD",           # diff range; also accepts "main..feature"
    budget_tokens=None,       # None = convergence-based (default)
                              #   0  = diff only, no expansion (recall floor)
                              #  <0  = unlimited (10M-token soft ceiling)
                              #  >0  = explicit token cap
    alpha=0.6,                # PPR damping factor
    tau=0.08,                 # stopping threshold
    full=False,               # skip smart selection
)
yaml_str = to_yaml(ctx)
```

## MCP Server

diffctx includes an [MCP](https://modelcontextprotocol.io) server that lets
AI assistants (Claude Code, Cursor, Windsurf, etc.) call diff context analysis
automatically during code review.

```bash
pip install 'diffctx[mcp]'
```

Add to your MCP client config (e.g. `~/.claude/mcp.json` for Claude Code):

```json
{
  "mcpServers": {
    "diffctx": {
      "command": "diffctx-mcp"
    }
  }
}
```

The server exposes a `get_diff_context` tool. Your AI assistant will
automatically call it when reviewing PRs, explaining changes, or investigating
broken tests — no manual invocation needed.

See [`src/diffctx/mcp/README.md`](src/diffctx/mcp/README.md) for configs
for Cursor, Continue, Windsurf, and Zed.

## Ignore Patterns

Respects `.gitignore` and `.diffctx/ignore` automatically.
Use `--no-default-ignores` to disable built-in patterns
(`.gitignore` and `.diffctx/ignore` still apply).

- Hierarchical: nested ignore files at each directory level
- Negation patterns: `!important.log` un-ignores a file
- Anchored patterns: `/root_only.txt` matches only in root
- Output file is always auto-ignored

Auto-discovered files:

- `.diffctx/ignore` — diffctx-specific ignore patterns
- `.diffctx/whitelist` — Include-only filter (only matched files included)

## Content Placeholders

- `<file too large: N bytes>` — exceeds `--max-file-bytes`
- `<binary file: N bytes>` — binary file detected
- `<unreadable content: not utf-8>` — not valid UTF-8
- `<unreadable content>` — permission denied or I/O error

## License

Apache 2.0

---

- [Changelog](CHANGELOG.md)
- [Security policy](SECURITY.md) — threat model and vulnerability reporting
- [Parameter strategy](docs/parameter-strategy.md) — how `--alpha`,
  `--tau`, and edge weights are calibrated

