Metadata-Version: 2.4
Name: hawkeye-analyzer
Version: 0.1.4
Summary: Architectural intelligence for AI coding agents — one call gives your editor full context before it edits
Author: Alex
License-Expression: MIT
Project-URL: Homepage, https://github.com/AlexxBenny/Hawkeye-analyze-your-codebase
Project-URL: Repository, https://github.com/AlexxBenny/Hawkeye-analyze-your-codebase
Project-URL: Changelog, https://github.com/AlexxBenny/Hawkeye-analyze-your-codebase/blob/main/CHANGELOG.md
Project-URL: Issues, https://github.com/AlexxBenny/Hawkeye-analyze-your-codebase/issues
Keywords: dependency,architecture,mcp,static-analysis,visualization
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: mcp
Requires-Dist: mcp>=1.0; extra == "mcp"
Provides-Extra: all
Requires-Dist: mcp>=1.0; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: mcp>=1.0; extra == "dev"
Requires-Dist: isort; extra == "dev"
Requires-Dist: black; extra == "dev"
Dynamic: license-file

# 🦅 Hawkeye

**Architectural intelligence for AI coding agents.** One call gives your AI editor full context about a file — dependencies, blast radius, cycles, health — before it writes a single line.

[![PyPI](https://img.shields.io/pypi/v/hawkeye-analyzer.svg)](https://pypi.org/project/hawkeye-analyzer/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://python.org)
[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
[![Tests](https://img.shields.io/badge/tests-271%20passed-brightgreen.svg)]()
[![Zero Dependencies](https://img.shields.io/badge/dependencies-0-brightgreen.svg)]()

---

## The Problem

AI coding agents edit files without knowing the architecture. They:
- Break imports they didn't know existed
- Refactor classes used by 20 other modules
- Create circular dependencies
- Miss that a "simple change" cascades through 34 files

**Hawkeye fixes this.** It gives the AI the same architectural awareness a senior engineer has — in one deterministic, token-efficient JSON call.

---

## Setup for AI Editors (MCP)

Hawkeye exposes **10 tools** via [Model Context Protocol](https://modelcontextprotocol.io/). Install and configure in under 60 seconds:

### 1. Install

```bash
pip install "hawkeye-analyzer[mcp]"
```

### 2. Add to your editor's MCP config

**Claude Code** (`~/.claude/claude_desktop_config.json`):
```json
{
  "mcpServers": {
    "hawkeye": {
      "command": "hawkeye-mcp",
      "args": ["--project", "/path/to/your/project"]
    }
  }
}
```

**Cursor** (`.cursor/mcp.json` in project root):
```json
{
  "mcpServers": {
    "hawkeye": {
      "command": "hawkeye-mcp",
      "args": ["--project", "."]
    }
  }
}
```

**Windsurf / Other MCP clients** — same pattern. The server uses stdio transport and pre-analyzes the project on startup.

### 3. Done

The AI editor now has access to 10 architectural intelligence tools. The most important one:

```
hawkeye_file_context("src/core/engine.py")
```

Returns everything the agent needs in **one call**:

```json
{
  "module": "myapp.core.engine",
  "file": "core/engine.py",
  "loc": 340,
  "dependency_count": 3,
  "dependent_count": 8,
  "dependencies": [
    {"module": "myapp.core.scanner", "file": "core/scanner.py"},
    {"module": "myapp.core.analyzer", "file": "core/analyzer.py"}
  ],
  "dependents": [
    {"module": "myapp.cli", "file": "cli.py"},
    {"module": "myapp.server.mcp", "file": "server/mcp.py"}
  ],
  "impact": {"direct": 8, "transitive": 12},
  "metrics": {
    "ca": 8, "ce": 3, "instability": 0.273,
    "health": "critical",
    "cyclomatic_complexity": 45,
    "cognitive_complexity": 32
  },
  "insights": ["extreme_cyclomatic", "critical_blast_radius"],
  "risk_profile": "hub",
  "in_cycles": false
}
```

---

## Design Principles

Hawkeye is built specifically for AI agent consumption:

| Principle | How |
|-----------|-----|
| **Deterministic** | Same code → same output. No randomness, no LLM in the loop. Pure AST + graph algorithms. |
| **Token-efficient** | Compact mode (default) strips verbose fields. A healthy module adds ~5 tokens. A problematic one adds ~30. Zero wasted tokens on modules with no issues. |
| **One-call context** | `hawkeye_file_context` replaces 5+ separate queries. One tool call = full architectural picture. |
| **Fast** | Single-pass AST parsing. 281 modules analyzed in ~5 seconds. Results cached for the session. |
| **Zero dependencies** | Core analysis uses Python stdlib only. No transitive dependency hell. Installs in under a second. |
| **Machine-readable** | Every output is structured JSON. Insight codes are enumerated strings, not natural language. Risk profiles are single-token labels. |

### Token Budget

| Scenario | Tokens added to context |
|----------|------------------------|
| Healthy module, no issues | ~80 tokens |
| Module with warnings | ~150 tokens |
| Critical module with cycles | ~250 tokens |
| Batch context (3 files) | ~400 tokens |

Compare this to dumping raw `import` statements or `grep` results — Hawkeye gives the AI **structured, pre-analyzed** architectural data at a fraction of the token cost.

---

## MCP Tools Reference

After calling `hawkeye_analyze(project_path)` once, all other tools are available:

| Tool | Purpose | When to use |
|------|---------|-------------|
| **`hawkeye_file_context(file)`** | Everything about a file — deps, dependents, impact, cycles, health, insights, risk | **Before editing any file** |
| `hawkeye_context(files)` | Combined context for multi-file edits — shared deps, combined blast radius | Before editing 2+ related files |
| `hawkeye_impact(file, symbol)` | Symbol-level blast radius — who uses `class Engine`? | Before renaming/refactoring a class or function |
| `hawkeye_symbols(file)` | List all classes/functions with usage counts | Understanding what a module exports |
| `hawkeye_find(pattern)` | Search modules by name | Discovering module names |
| `hawkeye_cycles()` | All import cycles with severity and break suggestions | Checking for circular dependencies |
| `hawkeye_metrics(sort_by, limit)` | Coupling + complexity table for all modules | Finding the riskiest modules |
| `hawkeye_path(source, target)` | Shortest dependency path between two modules | Understanding how modules are connected |
| `hawkeye_graph(max_depth)` | Full dependency graph as JSON | Structural overview |

### Recommended Agent Workflow

```
1. hawkeye_analyze("/path/to/project")     — scan once on startup
2. hawkeye_file_context("file_to_edit.py") — before every edit
3. hawkeye_impact("file.py", "ClassName")  — before refactoring a symbol
4. hawkeye_cycles()                         — after creating new imports
```

---

## Interpreting the Output

### Insight Codes

Machine-readable labels derived deterministically from metrics. No natural language, no ambiguity:

| Code | Severity | What it means |
|------|----------|---------------|
| `high_instability` | warning | Many outgoing deps, few incoming — volatile |
| `highly_stable` | info | Many incoming deps — changes here propagate widely |
| `high_efferent` | warning | Depends on too many modules |
| `high_afferent` | warning | Too many modules depend on this |
| `extreme_cyclomatic` | critical | Very high branching complexity (CC ≥ 50) |
| `extreme_cognitive` | critical | Deeply nested control flow (Cog ≥ 50) |
| `high_cyclomatic` | warning | Elevated branching complexity (CC ≥ 20) |
| `high_cognitive` | warning | Moderately nested control flow (Cog ≥ 25) |
| `critical_blast_radius` | critical | ≥10 modules directly depend on this |
| `high_blast_radius` | warning | ≥5 modules directly depend on this |
| `very_large_module` | warning | ≥500 LOC |
| `in_cycle` | critical/warning | Involved in an import cycle |
| `zone_of_pain` | warning | Concrete + stable = rigid, hard to extend |
| `zone_of_uselessness` | warning | Abstract + unstable = possibly unused abstractions |
| `well_balanced` | info | On the main sequence (good A/I balance) |
| `isolated` | info | No internal dependencies or dependents |
| `high_fan_out` | info | Imports many modules (high coordination surface) |
| `wide_transitive_reach` | info | Transitive impact much wider than direct |

### Risk Profiles

Single-token classification of a module's structural role:

| Label | Meaning | Agent should... |
|-------|---------|-----------------|
| `hub` | High dependents + high complexity | Edit with extreme care — many things break |
| `tangled` | Involved in import cycles | Fix the cycle before adding more imports |
| `fragile` | High complexity + high instability | Likely to break — add tests first |
| `volatile` | High instability + many outgoing deps | Unstable foundation — minimize changes |
| `amplifier` | Changes cascade widely (transitive >> direct) | Check transitive dependents before editing |
| `null` | No structural risk | Safe to edit freely |

### Health Labels

Three-state composite assessment:

| Label | Meaning | Thresholds (default profile) |
|-------|---------|------------------------------|
| `healthy` | No coupling or complexity concerns | CC < 20, Cog < 25, Ca < 8, Ce < 8 |
| `warning` | Elevated risk in one dimension | CC ≥ 20 or Cog ≥ 25 or coupling ≥ 8 |
| `critical` | Multiple risk factors or extreme values | CC ≥ 50 or Cog ≥ 50 or both coupling high |

---

## CLI for Humans

Hawkeye also works as a standalone CLI:

```bash
pip install hawkeye-analyzer

# Full project analysis
hawkeye analyze ./myproject

# Interactive dependency graph in your browser
hawkeye show ./myproject

# Metrics deep-dive with per-function complexity
hawkeye metrics ./myproject --sort health --functions

# Symbol blast radius
hawkeye impact ./myproject src/engine.py -s Engine

# CI gate — fails on rule violations or import cycles
hawkeye check ./myproject --no-cycles

# AI-ready JSON context
hawkeye context ./myproject src/engine.py
```

### Output Formats

| Command | Formats |
|---------|---------|
| `hawkeye analyze` | `--format text` (default), `json`, `html`, `dot` |
| `hawkeye metrics` | text (default), `--json`, `--functions` |
| `hawkeye impact` | text (default), `--json`, `--hotspots`, `--unused` |
| `hawkeye context` | JSON only (designed for machine consumption) |

---

## Configuration

Place a `hawkeye.toml` in your project root. Hawkeye auto-discovers it by walking up from the project directory.

### Minimal Setup

```toml
[project]
name = "MyProject"

[scan]
exclude_patterns = ["*.tests.*", "*.test_*"]  # Keep test modules out of coupling analysis
```

### Architecture Rules

```toml
# Enforce layered architecture
[rules.layers]
order = ["models", "services", "api", "cli"]
direction = "downward"

# Block specific imports
[[rules.forbidden]]
from = "api.*"
to = ["cli.*", "scripts.*"]

# Module groups must be independent (transitive — catches indirect paths too)
[[rules.independence]]
modules = ["auth", "billing", "notifications"]

# Only auth may import secrets
[[rules.protected]]
modules = ["core.secrets", "core.tokens"]
allowed_importers = ["auth.*"]

# Sibling services must not form cycles
[[rules.acyclic_siblings]]
ancestor = "services"
```

### Threshold Tuning

All 19 thresholds are configurable. Choose a profile, then override individual values:

```toml
[thresholds]
profile = "strict"    # "default", "strict", or "relaxed"
cc_critical = 40      # Override: relax cyclomatic critical for this project
loc_critical = 600    # Override: allow larger modules
```

| Profile | CC warn/crit | Cog warn/crit | LOC warn/crit | Dependents warn/crit |
|---------|-------------|---------------|---------------|---------------------|
| **default** | 20 / 50 | 25 / 50 | 300 / 500 | 5 / 10 |
| **strict** | 10 / 30 | 15 / 30 | 200 / 300 | 3 / 5 |
| **relaxed** | 30 / 80 | 40 / 80 | 500 / 1000 | 10 / 20 |

The active profile is embedded in JSON output (`threshold_profile` field) for reproducibility.

<details>
<summary>All 19 threshold keys</summary>

| Key | Default | Controls |
|-----|---------|----------|
| `instability_high` | 0.8 | `high_instability` insight trigger |
| `instability_low` | 0.2 | `highly_stable` insight trigger |
| `ce_high` | 8 | Efferent coupling warning |
| `ca_high` | 8 | Afferent coupling warning |
| `cc_high` | 20 | Cyclomatic → warning |
| `cc_critical` | 50 | Cyclomatic → critical |
| `cog_high` | 25 | Cognitive → warning |
| `cog_critical` | 50 | Cognitive → critical |
| `loc_high` | 300 | `large_module` insight |
| `loc_critical` | 500 | `very_large_module` insight |
| `dependents_high` | 5 | Blast radius → warning |
| `dependents_critical` | 10 | Blast radius → critical |
| `dependencies_high` | 6 | `high_fan_out` insight |
| `cycle_size_high` | 4 | Cycle → critical severity |
| `distance_high` | 0.5 | Zone of pain / uselessness trigger |
| `distance_low` | 0.2 | `well_balanced` trigger |
| `abstract_high` | 0.8 | Highly abstract classification |
| `abstract_low` | 0.2 | Concrete classification |

</details>

---

## How It Works

```
Python files → AST parsing (single pass) → Import resolution → Dependency graph
                                                                      ↓
                    Symbol registry ← Symbol extraction     Graph algorithms
                         ↓                                        ↓
                  Symbol-level impact              Coupling metrics (Ca/Ce/I/A/D)
                  Hotspot detection                Complexity metrics (CC/Cog)
                  Dead code detection              Cycle detection (Tarjan's SCC)
                                                   Health classification
                                                   Insight derivation
                                                         ↓
                                              Deterministic JSON output
```

- **Single AST pass** per file — no re-parsing, no multiple traversals
- **Tarjan's SCC** for cycle detection — O(V+E), mathematically optimal
- **BFS reachability** for transitive impact — cached per session
- **Robert C. Martin's metrics** — Ca, Ce, Instability, Abstractness, Distance
- **SonarSource spec** for cognitive complexity — nesting-weighted, not just branch counting

---

## Performance

| Metric | Value |
|--------|-------|
| 281 modules, 58K LOC | ~5 seconds full analysis |
| Incremental queries after analysis | <10ms per call |
| Memory | Graph + metrics cached in-process |
| Install time | <1 second (zero dependencies) |
| MCP server startup with pre-analysis | ~6 seconds |

---

## Project Structure

```
src/hawkeye/
├── engine.py           # Central orchestrator — the main API
├── config.py           # TOML config with walk-up discovery
├── cli.py              # 7 CLI commands
├── core/
│   ├── scanner.py      # File discovery + LOC
│   ├── analyzer.py     # AST imports + symbols + complexity
│   ├── graph.py        # Directed graph + algorithms
│   ├── metrics.py      # Ca/Ce/I/A/D + health scoring
│   ├── cycles.py       # Tarjan's SCC + severity
│   ├── rules.py        # 5 architecture rule types
│   ├── insights.py     # Deterministic insight derivation
│   └── symbols.py      # Cross-file symbol resolution
├── server/
│   └── mcp.py          # 10 MCP tools
└── visualizer/
    ├── html_renderer.py    # Interactive D3.js graph
    ├── dot_renderer.py     # Graphviz DOT
    ├── text_renderer.py    # Terminal tables
    └── json_renderer.py    # Structured JSON
```

271 tests across 11 test files. Zero required dependencies. Python 3.10+.

## License

MIT — see [LICENSE](LICENSE) for details.
