Metadata-Version: 2.4
Name: axe-dig
Version: 1.6.0
Summary:  AST, Call Graph, CFG, DFG, PDG.  
Author: SRSWTI Research Labs
Maintainer: SRSWTI Research Labs
License: AGPL-3.0
Project-URL: Homepage, https://github.com/SRSWTI/axe-dig
Project-URL: Repository, https://github.com/SRSWTI/axe-dig
Project-URL: Issues, https://github.com/SRSWTI/axe-dig/issues
Project-URL: Documentation, https://github.com/SRSWTI/axe-dig#readme
Keywords: code-analysis,llm,ai,tree-sitter,call-graph,cfg,dfg,pdg,mcp,static-analysis
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Code Generators
Classifier: Topic :: Software Development :: Documentation
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: pygments-tldr>=2.19.1.3
Requires-Dist: requests>=2.25.0
Requires-Dist: mcp>=1.0.0
Requires-Dist: pathspec>=0.12.0
Requires-Dist: tree-sitter>=0.23.0
Requires-Dist: tree-sitter-python>=0.23.0
Requires-Dist: tree-sitter-typescript>=0.23.0
Requires-Dist: tree-sitter-javascript>=0.23.0
Requires-Dist: tree-sitter-go>=0.23.0
Requires-Dist: tree-sitter-rust>=0.23.0
Requires-Dist: tree-sitter-java>=0.23.0
Requires-Dist: tree-sitter-c>=0.21.0
Requires-Dist: tree-sitter-cpp>=0.20.0
Requires-Dist: tree-sitter-ruby>=0.20.0
Requires-Dist: tree-sitter-php>=0.20.0
Requires-Dist: tree-sitter-c-sharp>=0.20.0
Requires-Dist: tree-sitter-kotlin>=1.0.0
Requires-Dist: tree-sitter-scala>=0.23.0
Requires-Dist: tree-sitter-lua<=0.4.1
Requires-Dist: tree-sitter-luau
Requires-Dist: tree-sitter-elixir>=0.3.0
Requires-Dist: sentence-transformers>=5.2.0
Requires-Dist: faiss-cpu>=1.13.2
Requires-Dist: anthropic>=0.75.0
Requires-Dist: tiktoken>=0.12.0
Requires-Dist: rich>=14.2.0
Provides-Extra: ai
Requires-Dist: anthropic>=0.3.0; extra == "ai"
Requires-Dist: openai>=1.0.0; extra == "ai"
Provides-Extra: cli
Requires-Dist: rich>=13.0; extra == "cli"
Requires-Dist: shtab>=1.7.0; extra == "cli"
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: pytest-cov>=2.0; extra == "dev"
Requires-Dist: black>=21.0; extra == "dev"
Requires-Dist: flake8>=3.8; extra == "dev"
Requires-Dist: mypy>=0.900; extra == "dev"
Requires-Dist: rich>=13.0; extra == "dev"
Requires-Dist: shtab>=1.7.0; extra == "dev"

# DIG: Code Analysis for AI Agents

**Give LLMs exactly the code they need. Nothing more.**

```bash
# One-liner: Install, index, search
pip install axe-dig && chop warm . && chop semantic "what you're looking for" .
```

Your codebase is 100K lines. Claude's context window is 200K tokens. Raw code won't fit—and even if it did, the LLM would drown in irrelevant details.

DIG extracts *structure* instead of dumping *text*. The result: **95% fewer tokens** while preserving everything needed to understand and edit code correctly.

```bash
pip install axe-dig
chop warm .                    # Index your project
chop context main --project .  # Get LLM-ready summary
```

---

## How It Works

DIG builds 5 analysis layers, each answering different questions:

```
┌─────────────────────────────────────────────────────────────┐
│ Layer 5: Program Dependence  → "What affects line 42?"      │
│ Layer 4: Data Flow           → "Where does this value go?"  │
│ Layer 3: Control Flow        → "How complex is this?"       │
│ Layer 2: Call Graph          → "Who calls this function?"   │
│ Layer 1: AST                 → "What functions exist?"      │
└─────────────────────────────────────────────────────────────┘
```

**Why layers?** Different tasks need different depth:
- Browsing code? Layer 1 (structure) is enough
- Refactoring? Layer 2 (call graph) shows what breaks
- Debugging null? Layer 5 (slice) shows only relevant lines

The daemon keeps indexes in memory for **100ms queries** instead of 30-second CLI spawns.

### Architecture

```
┌──────────────────────────────────────────────────────────────────┐
│                         YOUR CODE                                │
│  src/*.py, lib/*.ts, pkg/*.go                                    │
└───────────────────────────┬──────────────────────────────────────┘
                            │ tree-sitter
                            ▼
┌──────────────────────────────────────────────────────────────────┐
│                     5-LAYER ANALYSIS                             │
│  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐     │
│  │   AST   │→│  Calls  │→│   CFG   │→│   DFG   │→│   PDG   │     │
│  │   L1    │ │   L2    │ │   L3    │ │   L4    │ │   L5    │     │
│  └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘     │
└───────────────────────────┬──────────────────────────────────────┘
                            │ bge-large-en-v1.5
                            ▼
┌──────────────────────────────────────────────────────────────────┐
│                    SEMANTIC INDEX                                │
│  1024-dim embeddings in FAISS  →  "find JWT validation"          │
└───────────────────────────┬──────────────────────────────────────┘
                            │
                            ▼
┌──────────────────────────────────────────────────────────────────┐
│                       DAEMON                                     │
│  In-memory indexes  •  100ms queries  •  Auto-lifecycle          │
└──────────────────────────────────────────────────────────────────┘
```

### The Semantic Layer: Search by Behavior

The real power comes from combining all 5 layers into **searchable embeddings**.

Every function gets indexed with:
- Signature + docstring (L1)
- What it calls + who calls it (L2)
- Complexity metrics (L3)
- Data flow patterns (L4)
- Dependencies (L5)
- First ~10 lines of actual code

This gets encoded into **1024-dimensional vectors** using `bge-large-en-v1.5`. The result: search by *what code does*, not just what it says.

```bash
# "validate JWT" finds verify_access_token() even without that exact text
chop semantic "validate JWT tokens and check expiration" .
```

**Why this works:** Traditional search finds `authentication` in variable names and comments. Semantic search understands that `verify_access_token()` *performs* JWT validation because the call graph and data flow reveal its purpose.

### Setting Up Semantic Search

```bash
# Build the semantic index (one-time, ~2 min for typical project)
chop warm /path/to/project

# Search by behavior
chop semantic "database connection pooling" .
```

Embedding dependencies (`sentence-transformers`, `faiss-cpu`) are included with `pip install axe-dig`. The index is cached in `.dig/cache/semantic.faiss`.

### Keeping the Index Fresh

The daemon tracks dirty files and auto-rebuilds after 20 changes, but you need to notify it when files change:

```bash
# Notify daemon of a changed file
chop daemon notify src/auth.py --project .
```

**Integration options:**

1. **Git hook** (post-commit):
   ```bash
   git diff --name-only HEAD~1 | xargs -I{} chop daemon notify {} --project .
   ```

2. **Editor hook** (on save):
   ```bash
   chop daemon notify "$FILE" --project .
   ```

3. **Manual rebuild** (when needed):
   ```bash
   chop warm .  # Full rebuild
   ```

The daemon auto-rebuilds semantic embeddings in the background once the dirty threshold (default: 20 files) is reached.

---

## The Workflow

### Before Reading Code
```bash
chop tree src/                      # See file structure
chop structure src/ --lang python   # See functions/classes
```

### Before Editing
```bash
chop extract src/auth.py            # Full file analysis
chop context login --project .      # LLM-ready summary (95% savings)
```

### Before Refactoring
```bash
chop impact login .                 # Who calls this? (reverse call graph)
chop change-impact                  # Which tests need to run?
```

### Debugging
```bash
chop slice src/auth.py login 42     # What affects line 42?
chop dfg src/auth.py login          # Trace data flow
```

### Finding Code by Behavior
```bash
chop semantic "validate JWT tokens" .   # Natural language search
```

### Advanced Analysis & Visualization
```bash
chop cycles .                           # Detect recursion loops
chop path src_func tgt_func             # Find shortest call path
# Run inject_data.py to generate the dig_visualizer.html
```

---

## Quick Setup

### 1. Install

```bash
pip install axe-dig
```

### 2. Index Your Project

```bash
chop warm /path/to/project
```

This builds all analysis layers and starts the daemon. Takes 30-60 seconds for a typical project, then queries are instant.

### 3. Start Using

```bash
chop context main --project .   # Get context for a function
chop impact helper_func .       # See who calls it
chop semantic "error handling"  # Find by behavior
```

---

## Real Example: Why This Matters

**Scenario:** Debug why `user` is null on line 42.

**Without DIG:**
1. Read the 150-line function
2. Trace every variable manually
3. Miss the bug because it's hidden in control flow

**With DIG:**
```bash
chop slice src/auth.py login 42
```

**Output:** Only 6 lines that affect line 42:
```python
3:   user = db.get_user(username)
7:   if user is None:
12:      raise NotFound
28:  token = create_token(user)  # ← BUG: skipped null check
35:  session.token = token
42:  return session
```

The bug is obvious. Line 28 uses `user` without going through the null check path.

---

## Command Reference

### Exploration
| Command | What It Does |
|---------|--------------|
| `chop tree [path]` | File tree |
| `chop structure [path] --lang <lang>` | Functions, classes, methods |
| `chop search <pattern> [path]` | Text pattern search |
| `chop extract <file>` | Full file analysis |

### Analysis
| Command | What It Does |
|---------|--------------|
| `chop context <func> --project <path>` | LLM-ready summary (95% savings) |
| `chop cfg <file> <function>` | Control flow graph |
| `chop dfg <file> <function>` | Data flow graph |
| `chop slice <file> <func> <line>` | Program slice |

### Cross-File
| Command | What It Does |
|---------|--------------|
| `chop calls [path]` | Build call graph |
| `chop impact <func> [path]` | Find all callers (reverse call graph) |
| `chop cycles [path]` | Detect recursive loops and circular deps |
| `chop path <src> <tgt> [path]` | Find shortest path between functions |
| `chop dead [path]` | Find unreachable code |
| `chop arch [path]` | Detect architecture layers |
| `chop imports <file>` | Parse imports |
| `chop importers <module> [path]` | Find files that import a module |

### Visualization
| Command | What It Does |
|---------|--------------|
| `python3 axe-dig/inject_data.py` | Generate interactive knowledge graph |
| `open dig_visualizer.html` | Open premium 5-layout visualizer |

### Semantic
| Command | What It Does |
|---------|--------------|
| `chop warm <path>` | Build all indexes (including embeddings) |
| `chop semantic <query> [path]` | Natural language code search |

### Diagnostics
| Command | What It Does |
|---------|--------------|
| `chop diagnostics <file>` | Type check + lint |
| `chop change-impact [files]` | Find tests affected by changes |
| `chop doctor` | Check/install diagnostic tools |

### Daemon
| Command | What It Does |
|---------|--------------|
| `chop daemon start` | Start background daemon |
| `chop daemon stop` | Stop daemon |
| `chop daemon status` | Check status |

---

## Supported Languages

Python, TypeScript, JavaScript, Go, Rust, Java, C, C++, Ruby, PHP, C#, Kotlin, Scala, Swift, Lua, Elixir

Language is auto-detected or specify with `--lang`.

---

## MCP Integration

For AI tools (Claude Desktop, Claude Code):

**Claude Desktop** - Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
```json
{
  "mcpServers": {
    "dig": {
      "command": "dig-mcp",
      "args": ["--project", "/path/to/your/project"]
    }
  }
}
```

**Claude Code** - Add to `.claude/settings.json`:
```json
{
  "mcpServers": {
    "dig": {
      "command": "dig-mcp",
      "args": ["--project", "."]
    }
  }
}
```

---

## Configuration

### `.digignore` - Exclude Files

DIG respects `.digignore` (gitignore syntax) for all commands including `tree`, `structure`, `search`, `calls`, and semantic indexing:

```bash
# Auto-create with sensible defaults
dig warm .  # Creates .digignore if missing
```

**Default exclusions:**
- `node_modules/`, `.venv/`, `__pycache__/`
- `dist/`, `build/`, `*.egg-info/`
- Binary files (`*.so`, `*.dll`, `*.whl`)
- Security files (`.env`, `*.pem`, `*.key`)

**Customize** by editing `.digignore`:
```gitignore
# Add your patterns
large_test_fixtures/
vendor/
data/*.csv
```

**CLI Flags:**
```bash
# Add patterns from command line (can be repeated)
chop --ignore "packages/old/" --ignore "*.generated.ts" tree .

# Bypass all ignore patterns
chop --no-ignore tree .
```

### Settings - Daemon Behavior

Create `.dig/config.json` for daemon settings:

```json
{
  "semantic": {
    "enabled": true,
    "auto_reindex_threshold": 20
  }
}
```

| Setting | Default | Description |
|---------|---------|-------------|
| `enabled` | `true` | Enable semantic search |
| `auto_reindex_threshold` | `20` | Files changed before auto-rebuild |

### Monorepo Support

For monorepos, create `.claude/workspace.json` to scope indexing:

```json
{
  "active_packages": ["packages/core", "packages/api"],
  "exclude_patterns": ["**/fixtures/**"]
}
```

---

## Performance

| Metric | Raw Code | DIG | Improvement |
|--------|----------|------|-------------|
| Tokens for function context | 21,000 | 175 | **99% savings** |
| Tokens for codebase overview | 104,000 | 12,000 | **89% savings** |
| Query latency (daemon) | 30s | 100ms | **300x faster** |

---

## Deep Dive

For the full architecture explanation, benchmarks, and advanced workflows:

**[Full Documentation](./docs/DIG.md)**

---

## License

AGPL-3.0 - See LICENSE file.
