Metadata-Version: 2.4
Name: neuro-scan
Version: 0.2.2
Summary: LLM Neuroanatomy Explorer — map what each transformer layer does
Project-URL: Homepage, https://github.com/XXO47OXX/neuro-scan
Project-URL: Documentation, https://github.com/XXO47OXX/neuro-scan#readme
Project-URL: Issues, https://github.com/XXO47OXX/neuro-scan/issues
Project-URL: Changelog, https://github.com/XXO47OXX/neuro-scan/blob/main/CHANGELOG.md
Author: XXO47OXX
License-Expression: MIT
License-File: LICENSE
License-File: NOTICE
Keywords: attention-analysis,interpretability,layer-ablation,llm,logit-lens,mechanistic-interpretability,neuroanatomy,transformer
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: numpy>=1.24.0
Requires-Dist: plotly>=5.18.0
Requires-Dist: rich>=13.0.0
Requires-Dist: safetensors>=0.4.0
Requires-Dist: scikit-learn>=1.3.0
Requires-Dist: torch>=2.1.0
Requires-Dist: tqdm>=4.65.0
Requires-Dist: transformers>=4.36.0
Requires-Dist: typer>=0.9.0
Provides-Extra: dev
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
Requires-Dist: pytest>=7.4.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: exllamav2
Requires-Dist: exllamav2>=0.1.0; extra == 'exllamav2'
Provides-Extra: lookup
Requires-Dist: datasets>=2.0.0; extra == 'lookup'
Description-Content-Type: text/markdown

<h1 align="center">neuro-scan</h1>
<p align="center">
  <strong>LLM Neuroanatomy Explorer — map what each transformer layer does</strong>
</p>

<p align="center">
  <a href="https://pypi.org/project/neuro-scan/"><img src="https://img.shields.io/pypi/v/neuro-scan?color=blue" alt="PyPI"></a>
  <a href="https://pypi.org/project/neuro-scan/"><img src="https://img.shields.io/pypi/pyversions/neuro-scan" alt="Python"></a>
  <a href="https://github.com/XXO47OXX/neuro-scan/blob/main/LICENSE"><img src="https://img.shields.io/github/license/XXO47OXX/neuro-scan" alt="License"></a>
  <a href="https://github.com/XXO47OXX/neuro-scan/actions/workflows/ci.yml"><img src="https://github.com/XXO47OXX/neuro-scan/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
</p>

<p align="center">
  <a href="https://img.shields.io/badge/Python-3.10+-blue.svg"><img src="https://img.shields.io/badge/Python-3.10+-blue.svg" alt="Python"></a>
  <a href="https://pytorch.org/"><img src="https://img.shields.io/badge/PyTorch-2.1+-ee4c2c.svg" alt="PyTorch"></a>
  <a href="https://huggingface.co/docs/transformers"><img src="https://img.shields.io/badge/HuggingFace-Transformers-yellow.svg" alt="HuggingFace"></a>
  <a href="https://plotly.com/"><img src="https://img.shields.io/badge/Plotly-Heatmaps-3F4F75.svg" alt="Plotly"></a>
  <a href="https://github.com/turboderp-org/exllamav2"><img src="https://img.shields.io/badge/ExLlamaV2-Quantized-green.svg" alt="ExLlamaV2"></a>
</p>

---

## Ecosystem

| Tool | What it does | Question it answers |
|------|-------------|-------------------|
| [**layer-scan**](https://github.com/XXO47OXX/layer-scan) | Find optimal layer duplication config | **What to do** — which layers to duplicate |
| **neuro-scan** | Map what each layer does | **Why it works** — understand layer functions |

> layer-scan users are neuro-scan's natural first users: understand your model's layers before you duplicate them.

### Ablation Sensitivity

<p align="center">
  <img src="docs/ablation-screenshot.png" alt="neuro-scan ablation chart — Qwen2-1.5B / math probe" width="800">
</p>

<p align="center"><em>Layer ablation sensitivity for Qwen2-1.5B with math probe. Bars colored by auto-detected function (reasoning, syntax, etc.). Gold stars mark the most critical layers.</em></p>

### Logit Lens Trajectory

<p align="center">
  <img src="docs/logit-lens-screenshot.png" alt="neuro-scan logit lens — Qwen2-1.5B / math probe" width="800">
</p>

<p align="center"><em>Logit lens heatmap showing when the correct answer token emerges across layers. Red diamonds mark the emergence point for each sample.</em></p>

## Features

- **Layer Ablation** — zero out each layer one-by-one, measure the score impact
- **Logit Lens** — project each layer's hidden state to vocabulary space, watch the answer emerge
- **Tuned Lens** — per-layer affine probes that reduce early-layer bias by 4-5 bits (Belrose 2023)
- **Attention Entropy** — quantify how focused or diffuse each attention head is
- **Circuit Detection** — find synergistic and redundant layer pairs via targeted pairwise ablation
- **Block Influence** — one forward pass to estimate all layers' importance (ShortGPT BI metric)
- **Cross-probe Analysis** — identify universal vs probe-specific important layers
- **Multi-model Comparison** — compare neuroanatomy across different models
- **Auto Layer Labeling** — automatically classify layers as early_processing, syntax, reasoning, formatting, or output
- **Prompt Repetition Experiment** — test whether repeating a prompt N times approximates duplicating K layers
- **Interactive HTML Charts** — Plotly-powered visualizations for all analysis types
- **Pre-computed Fetch** — download community neuroanatomy reports from HuggingFace Hub (no GPU needed)

## Installation

```bash
# pipx (recommended, isolated env)
pipx install neuro-scan

# pip
pip install neuro-scan

# For pre-computed report fetch (no GPU required):
pip install neuro-scan[lookup]
```

## Quick Start

```bash
# Full neuroanatomy map (recommended)
neuro-scan map --model <path-or-hf-id> --probe math

# Individual analyses
neuro-scan ablate --model <path> --probe math
neuro-scan logit-lens --model <path> --probe math
neuro-scan attention --model <path> --probe math

# Circuit detection
neuro-scan circuit --model <path> --probe math --strategy fast

# Cross-probe analysis
neuro-scan cross-probe --model <path> --probes "math,eq,json"

# Multi-model comparison
neuro-scan compare report_a.json report_b.json

# Tuned Lens (two-step workflow)
neuro-scan calibrate --model <path> --output lens.safetensors
neuro-scan logit-lens --model <path> --tuned-lens lens.safetensors

# Fetch pre-computed results (no GPU needed)
neuro-scan fetch --model Qwen/Qwen2-7B --probe math

# Prompt repetition experiment
neuro-scan prompt-repeat --model <path> --probe math --repeat-counts 1,2,3,4

# Utilities
neuro-scan probes
neuro-scan version
```

## Commands

| Command | Description |
|---------|-------------|
| `map` | Full neuroanatomy (ablation + logit lens + attention + labeling) |
| `ablate` | Layer ablation sensitivity scan |
| `logit-lens` | Logit lens trajectory, optional tuned lens |
| `attention` | Attention entropy analysis (experimental) |
| `circuit` | Detect synergistic/redundant layer pair circuits |
| `cross-probe` | Compare layer importance across multiple probes |
| `compare` | Compare neuroanatomy across multiple models |
| `calibrate` | Train tuned lens affine probes |
| `fetch` | Download pre-computed reports from HuggingFace Hub |
| `prompt-repeat` | Prompt repetition experiment |
| `probes` | List available evaluation probes |
| `version` | Show version |

### Common Options

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `--model`, `-m` | str | required | Model path or HuggingFace ID |
| `--probe`, `-p` | str | `math` | Probe: math, eq, json, custom |
| `--backend`, `-b` | str | `transformers` | Backend: transformers, exllamav2 |
| `--batch-size` | int | `16` | Samples per evaluation |
| `--output`, `-o` | str | `./results` | Output directory |
| `--top-k`, `-k` | int | `10` | Top layers to highlight |
| `--dtype` | str | `float16` | Model dtype |
| `--verbose`, `-v` | bool | `false` | Verbose logging |

## Circuit Detection

The `circuit` command goes beyond single-layer ablation to find **interacting layer pairs** — layers that cooperate (synergistic) or overlap (redundant).

### Three-phase Pipeline

1. **Phase A — Candidate Selection**: Uses single-layer ablation results to identify the top-K most sensitive layers
2. **Phase B — Similarity Filtering** (thorough mode): Computes cosine similarity between layer representations to identify structurally related pairs
3. **Phase C — Pairwise Ablation**: Tests candidate pairs by ablating both layers simultaneously

### Interaction Types

- **Synergistic** (interaction > 0): Ablating both layers together causes *more* damage than the sum of individual ablations — these layers cooperate
- **Redundant** (interaction < 0): Ablating both causes *less* damage than expected — these layers have overlapping function
- **Independent** (interaction ~ 0): Layers function independently

### Strategy Options

| Strategy | Pairs Tested | Speed | Use Case |
|----------|-------------|-------|----------|
| `fast` | Top-K pairs + adjacent | ~100 evals | Quick overview |
| `thorough` | + similarity-filtered | ~150 evals | Standard analysis |
| `exhaustive` | All L(L-1)/2 pairs | ~8000 evals | Complete picture |

```bash
neuro-scan circuit --model <path> --probe math --strategy fast --top-k-pairs 10
```

Output: `circuit.json` with all interaction results, synergistic pairs, and redundant pairs.

## Tuned Lens

Standard logit lens applies the final layer's RMSNorm to intermediate hidden states, causing a systematic 4-5 bit bias in early layers. **Tuned lens** (Belrose et al. 2023) trains a per-layer affine probe to correct this.

### Two-step Workflow

```bash
# Step 1: Train the tuned lens (~minutes, single GPU)
neuro-scan calibrate --model <path> --output lens.safetensors --steps 250

# Step 2: Use it with logit-lens or map
neuro-scan logit-lens --model <path> --tuned-lens lens.safetensors
```

### How It Works

Each layer gets an affine translator `A_l * h_l + b_l` (initialized to identity + zero bias). Training minimizes KL divergence between the translated hidden state's logits and the final layer's logits using SGD with Nesterov momentum.

### File Size Reference

| Model Size | d_model | Layers | Lens File |
|-----------|---------|--------|-----------|
| 1.5B | 1536 | 28 | ~260 MB |
| 7B | 4096 | 32 | ~2 GB |
| 70B | 8192 | 80 | ~21 GB |

## Block Influence

**Block Influence** (ShortGPT, ACL 2025) measures each layer's contribution in a single forward pass:

```
BI(layer) = 1 - cos_sim(input_hidden_state, output_hidden_state)
```

- **High BI** = layer significantly transforms the representation (critical layer)
- **Low BI** = layer barely changes anything (potentially redundant)

Block Influence is computed automatically during `map` and reported alongside ablation results. It serves as a fast O(1) proxy for the O(L) ablation scan.

## Cross-probe Analysis

The `cross-probe` command runs ablation scans for multiple probes and identifies:

- **Universal layers** — important for *all* probes (appear in top-K for every probe)
- **Probe-specific layers** — important only for particular tasks
- **Correlation matrix** — how similar the layer importance profiles are between probes

```bash
neuro-scan cross-probe --model <path> --probes "math,eq,json" --top-k 10
```

Universal layers are strong candidates for duplication (they improve the model broadly), while probe-specific layers explain task-dependent behavior.

Output: `cross_probe.json` with per-probe ablation deltas, universal layers, and correlation matrix.

## Multi-model Comparison

The `compare` command takes two or more `report.json` files and produces:

- **Similarity matrix** — how similar are the neuroanatomy profiles
- **Shared reasoning layers** — layers with the same function across models (normalized by position)
- **Model rankings** — which model has the highest mean ablation sensitivity

```bash
neuro-scan compare report_a.json report_b.json report_c.json
```

Output: `comparison.json` with all comparison metrics.

## Pre-computed Fetch

The `fetch` command downloads community-contributed neuroanatomy reports from HuggingFace Hub — **no GPU required**.

```bash
# Show pre-computed report summary
neuro-scan fetch --model Qwen/Qwen2-7B --probe math

# Download full report.json locally
neuro-scan fetch --model Qwen/Qwen2-7B --probe math --output report.json
```

Requires the `lookup` extra: `pip install neuro-scan[lookup]`

Results are sourced from the `XXO47OXX/neuro-scan-results` HuggingFace dataset.

## Output Files

Running `neuro-scan map` generates:

| File | Content |
|------|---------|
| `ablation.html` | Interactive ablation sensitivity bar chart |
| `logit_lens.html` | Logit lens trajectory heatmap |
| `attention.html` | Attention entropy heatmap |
| `entropy_profile.html` | Layer-by-layer entropy profile chart |
| `report.json` | Full results in JSON format |
| `ablation.csv` | Ablation results as CSV |

Additional output files from specific commands:

| File | Command | Content |
|------|---------|---------|
| `circuit.json` | `circuit` | Synergistic/redundant layer pair interactions |
| `cross_probe.json` | `cross-probe` | Per-probe ablation deltas and correlation matrix |
| `comparison.json` | `compare` | Multi-model neuroanatomy comparison |
| `multi_probe.json` | `cross-probe` | Cross-probe analysis results |

## Auto Layer Labeling

neuro-scan automatically classifies each layer's function using a multi-signal algorithm:

| Label | Description | How Detected |
|-------|-------------|-------------|
| `early_processing` | Input embedding, token processing | First ~10% of layers |
| `syntax` | Grammatical patterns, structure | Before logit lens emergence |
| `reasoning` | Task-critical computation | Top-k ablation sensitivity |
| `semantic_processing` | Knowledge retrieval, understanding | Middle layers (default) |
| `formatting` | Response structuring | After emergence, before output |
| `output` | Final token selection | Last ~10% of layers |

Labels are suggestions based on automated analysis. The algorithm combines:
1. **Position heuristics** — layer position within the model
2. **Ablation sensitivity** — which layers cause the most score drop when removed
3. **Logit lens emergence** — when the correct answer token first appears

## Probes

| Probe | Samples | What it tests |
|-------|---------|--------------|
| `math` | 16 | Arithmetic, geometry, calculus, probability |
| `eq` | 12 | Emotions, social cues, sarcasm, psychology |
| `json` | 10 | JSON extraction, escaping, schema compliance |
| `custom` | user-defined | Load from JSON file with `--custom-probe` |

## Backends

| Backend | GPU Required | Quantization | Attention Extraction |
|---------|-------------|-------------|---------------------|
| `transformers` | Recommended | No | Full support |
| `exllamav2` | Required | GPTQ/EXL2 | Not supported |

## Prompt Repetition Experiment

The `prompt-repeat` command tests a hypothesis from Concept C:

> Does repeating a prompt N times approximate the effect of duplicating K transformer layers?

```bash
neuro-scan prompt-repeat --model <path> --probe math --repeat-counts 1,2,3,4
```

If results show 2x repetition approximates +K layers, this has implications for both layer duplication research and prompt engineering.

## layer-scan Integration

Use neuro-scan and [layer-scan](https://github.com/XXO47OXX/layer-scan) together for a complete workflow:

```bash
# Step 1: Understand what each layer does
neuro-scan map --model ./my-model --probe math

# Step 2: Find the optimal layer duplication config
layer-scan scan --model ./my-model --probe math --export-mergekit config.yaml

# Step 3: Annotate — overlay neuro-scan labels on layer-scan heatmap
layer-scan annotate --results results.json --neuro-report report.json

# The ablation chart from neuro-scan explains WHY certain
# layers are the best to duplicate (high reasoning sensitivity)
```

## Roadmap

- [x] **v0.1.0**: Core ablation, logit lens, attention entropy, auto labeling
- [x] **v0.2.0**: Scoring diagnostics (coverage), block influence, entropy profile
- [x] **v0.2.1**: Circuit detection, tuned lens, cross-probe, multi-model compare, cross-tool annotate
- [ ] **v0.3.0**: Pre-computed report database (`fetch` command), attribution patching
- [ ] **v1.0.0**: Web UI, real-time analysis dashboard

## References

- [Tuned Lens (Belrose et al. 2023)](https://arxiv.org/abs/2303.08112) — Affine probes for interpretable logit lens
- [ShortGPT (ACL 2025)](https://arxiv.org/abs/2403.03853) — Block Influence metric for layer importance
- [Entropy-Lens (arXiv 2025)](https://arxiv.org/abs/2502.16570) — Entropy profile visualization
- [Repeat Yourself: Layer Duplication](https://arxiv.org/abs/2502.01470) — Original RYS research
- [MergeKit](https://github.com/arcee-ai/mergekit) — Model merging toolkit

## Attribution & AI Policy

### Original Design

neuro-scan introduces the following innovations:
- **CLI-native neuroanatomy tool** — first tool combining ablation + logit lens + attention in one CLI
- **Automatic layer-function labeling** — multi-signal classification of layer roles
- **Circuit detection pipeline** — three-phase synergistic/redundant layer pair detection
- **Tuned lens native implementation** — no external dependency, trains on any model
- **Cross-probe universal layer identification** — find layers that matter for all tasks
- **Prompt repetition experiment** — built-in hypothesis testing for prompt engineering research
- **layer-scan ecosystem integration** — understand before you duplicate

### Fork & Derivative Works

If you fork or create derivative works, please:
1. Retain the copyright notice and NOTICE file
2. Attribute the original repository: https://github.com/XXO47OXX/neuro-scan

### AI Training

See `llms.txt` for AI training attribution requirements.

## License

MIT License. See [LICENSE](LICENSE) for details.
