Metadata-Version: 2.4
Name: layer-scan
Version: 0.2.1
Summary: Automated LLM layer duplication configuration scanner with heatmap visualization
Project-URL: Homepage, https://github.com/XXO47OXX/layer-scan
Project-URL: Documentation, https://github.com/XXO47OXX/layer-scan#readme
Project-URL: Issues, https://github.com/XXO47OXX/layer-scan/issues
Project-URL: Changelog, https://github.com/XXO47OXX/layer-scan/blob/main/CHANGELOG.md
Author: XXO47OXX
License-Expression: MIT
License-File: LICENSE
License-File: NOTICE
Keywords: layer-duplication,llm,optimization,rys,transformer
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: numpy>=1.24.0
Requires-Dist: plotly>=5.18.0
Requires-Dist: rich>=13.0.0
Requires-Dist: safetensors>=0.4.0
Requires-Dist: torch>=2.1.0
Requires-Dist: tqdm>=4.65.0
Requires-Dist: transformers>=4.36.0
Requires-Dist: typer>=0.9.0
Provides-Extra: dev
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
Requires-Dist: pytest>=7.4.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: exllamav2
Requires-Dist: exllamav2>=0.1.0; extra == 'exllamav2'
Provides-Extra: lookup
Requires-Dist: datasets>=2.0.0; extra == 'lookup'
Provides-Extra: vllm
Requires-Dist: vllm>=0.3.0; extra == 'vllm'
Description-Content-Type: text/markdown

<p align="center">
  <h1 align="center">layer-scan</h1>
  <p align="center">
    <strong>Automated LLM layer duplication config scanner — find the optimal (i,j) for any model + task</strong>
  </p>
  <p align="center">
    <a href="https://pypi.org/project/layer-scan/"><img src="https://img.shields.io/pypi/v/layer-scan" alt="PyPI"></a>
    <a href="https://pypi.org/project/layer-scan/"><img src="https://img.shields.io/pypi/pyversions/layer-scan" alt="Python"></a>
    <a href="https://github.com/XXO47OXX/layer-scan/blob/main/LICENSE"><img src="https://img.shields.io/github/license/XXO47OXX/layer-scan" alt="License"></a>
    <a href="https://github.com/XXO47OXX/layer-scan/actions"><img src="https://img.shields.io/github/actions/workflow/status/XXO47OXX/layer-scan/ci.yml" alt="CI"></a>
    <a href="https://codecov.io/gh/XXO47OXX/layer-scan"><img src="https://img.shields.io/codecov/c/github/XXO47OXX/layer-scan" alt="Coverage"></a>
  </p>
  <p align="center">
    <a href="https://python.org/"><img src="https://img.shields.io/badge/Python-3.10+-blue.svg" alt="Python"></a>
    <a href="https://pytorch.org/"><img src="https://img.shields.io/badge/PyTorch-2.1+-ee4c2c.svg" alt="PyTorch"></a>
    <a href="https://huggingface.co/docs/transformers"><img src="https://img.shields.io/badge/HuggingFace-Transformers-yellow.svg" alt="HuggingFace"></a>
    <a href="https://plotly.com/"><img src="https://img.shields.io/badge/Plotly-Heatmaps-3F4F75.svg" alt="Plotly"></a>
    <a href="https://github.com/turboderp-org/exllamav2"><img src="https://img.shields.io/badge/ExLlamaV2-Quantized-green.svg" alt="ExLlamaV2"></a>
  </p>
</p>

---

Given any open-source LLM and an evaluation probe, `layer-scan` finds the optimal layer duplication configuration `(i, j)` that maximizes model capability — **without modifying a single weight**.

## Why layer-scan?

| | Without layer-scan | With layer-scan |
|---|---|---|
| **Process** | Manually test 3,000+ (i,j) configs | One command |
| **Time** | Days of GPU time | Hours (automated) |
| **Output** | Spreadsheet of scores | Interactive heatmap + mergekit YAML |
| **Reproducibility** | Ad-hoc scripts | Deterministic logit scoring |

> The RYS authors manually scanned 3,241 configurations over several days. **layer-scan automates this entire process.**

### Interactive Heatmap

<p align="center">
  <img src="docs/heatmap-screenshot.png" alt="layer-scan heatmap — Qwen2-1.5B / math probe" width="800">
</p>

<p align="center"><em>Score delta heatmap for Qwen2-1.5B with math probe. Green = improvement over baseline, red = regression. Gold stars mark the top-5 configurations.</em></p>

## Features

- **Full (i,j) Configuration Scanning** — automated search across all valid layer duplication configs
- **Logit Distribution Scoring** — deterministic scoring without text generation, with coverage diagnostics
- **Multi-probe Cross-analysis** — scan multiple probes at once, find Pareto-optimal configs
- **Cross-tool Annotation** — overlay [neuro-scan](https://github.com/XXO47OXX/neuro-scan) layer labels on heatmaps
- **Scoring Diagnostics** — coverage field measures how much probability mass falls on scored tokens
- **Sparse-then-Dense Scanning** — two-phase strategy for faster exploration of large models
- **mergekit Integration** — one-click export of scan results as mergekit-compatible YAML
- **Interactive HTML Heatmaps** — Plotly-powered visualizations with hover details
- **Pre-computed Lookup** — fetch community scan results from HuggingFace Hub (no GPU needed)

## Installation

```bash
# pipx (recommended, isolated environment)
pipx install layer-scan

# pip
pip install layer-scan

# For ExLlamaV2 backend (recommended for 70B+ models on consumer GPUs):
pip install layer-scan[exllamav2]

# For pre-computed lookup (no GPU required):
pip install layer-scan[lookup]
```

## Quick Start

```bash
# Scan with math reasoning probe
layer-scan scan --model Qwen/Qwen2-7B --probe math

# Scan and export mergekit config in one step
layer-scan scan --model Qwen/Qwen2-7B --probe math --export-mergekit config.yaml

# Multi-probe cross-analysis (find Pareto-optimal configs)
layer-scan multi-probe --model Qwen/Qwen2-7B --probes "math,eq,json"

# Cross-tool annotation (overlay neuro-scan labels on heatmap)
layer-scan annotate --results results.json --neuro-report neuro_report.json

# Look up pre-computed results (no GPU needed)
layer-scan lookup --model Qwen/Qwen2-7B --probe math

# Then merge with mergekit
mergekit-yaml config.yaml ./merged-model
```

### More examples

```bash
# JSON compliance probe (detects IFEval regressions)
layer-scan scan --model Qwen/Qwen2-7B --probe json

# EQ probe (emotional intelligence)
layer-scan scan --model Qwen/Qwen2-7B --probe eq

# ExLlamaV2 for large quantized models
layer-scan scan \
  --model /models/qwen2-72b-exl2 \
  --probe math \
  --backend exllamav2 \
  --gpu-split "22000,22000"

# Custom probe from JSON file
layer-scan scan --model <path> --probe custom --custom-probe my_probe.json

# Sparse scan first, then refine (faster for large models)
layer-scan scan --model <path> --sparse-first --sparse-step 4
```

## Commands

| Command | Description |
|---------|-------------|
| `scan` | Scan (i,j) configs with a single probe |
| `multi-probe` | Cross-probe scan, find Pareto-optimal configs |
| `annotate` | Overlay neuro-scan labels on layer-scan heatmap |
| `lookup` | Fetch pre-computed results from HuggingFace Hub |
| `probes` | List available evaluation probes |
| `version` | Show version |

## How It Works

### Logit Distribution Scoring

Unlike traditional evaluation (generate text -> parse -> score), layer-scan scores directly from the **logit probability distribution**:

```
Restrict to digit tokens [0-9]
-> Softmax over restricted set
-> Expected score = sum(value x probability)
-> Uncertainty = sum((value - expected)^2 x probability)
```

This is:
- **Deterministic** (no sampling variance)
- **Fast** (no autoregressive generation)
- **Information-rich** (uses full distribution, not just argmax)

### Scoring Diagnostics

Each score includes a **coverage** field — the fraction of probability mass on scored digit tokens in the full vocabulary:

| Coverage | Interpretation |
|----------|---------------|
| > 0.5 | Score is reliable — the model is genuinely choosing among digits |
| 0.1 - 0.5 | Use with caution — the model partially intends non-digit output |
| < 0.1 | Score is noise — the model wants to output non-digit tokens |

Coverage is reported in scan summaries and included in `results.json` output.

### Layer Duplication

For configuration `(i=45, j=52)` on an 80-layer model:

```
Standard:    [0, 1, ..., 79]              -> 80 layers
Duplicated:  [0, 1, ..., 51, 45, ..., 79] -> 87 layers
                           ^^^^^^
                     these 7 layers execute twice
```

The model processes the same input through its "reasoning cortex" twice, enhancing depth of analysis.

## Multi-probe Analysis

The `multi-probe` command scans multiple probes in a single session and identifies **Pareto-optimal** configurations — configs that are not dominated by any other config across all probes.

```bash
layer-scan multi-probe --model Qwen/Qwen2-7B --probes "math,eq,json"
```

### Pareto Frontier

A config is Pareto-optimal if no other config scores better on *all* probes simultaneously. This finds balanced configs that improve the model broadly rather than overfitting to a single task.

### Output

The command produces `multi_probe.json` containing:
- **pareto_configs** — all Pareto-optimal (i,j) configs with per-probe scores
- **per_probe_best** — the single best config for each probe independently
- **normalized_score** — a balanced score for ranking Pareto configs

## Cross-tool Annotation

The `annotate` command overlays [neuro-scan](https://github.com/XXO47OXX/neuro-scan) layer labels onto layer-scan heatmaps, creating a unified visualization.

### Workflow

```bash
# Step 1: Scan layer duplication configs
layer-scan scan --model ./my-model --probe math

# Step 2: Run neuroanatomy analysis
neuro-scan map --model ./my-model --probe math

# Step 3: Annotate — overlay neuro-scan labels on heatmap
layer-scan annotate \
  --results ./results/results.json \
  --neuro-report ./results/report.json \
  --output annotated_heatmap.html
```

The annotated heatmap shows:
- neuro-scan layer labels (reasoning/syntax/output) as color bands
- How many "reasoning layers" each top config duplicates
- Explanation text: *"Config (i=12, j=20) is optimal because it duplicates layers 14, 16, 18 (all reasoning layers)"*

## Pre-computed Lookup

The `lookup` command fetches community-contributed scan results from HuggingFace Hub — **no GPU required**.

```bash
# Fetch pre-computed results
layer-scan lookup --model Qwen/Qwen2-7B --probe math

# Download full results.json locally
layer-scan lookup --model Qwen/Qwen2-7B --probe math --download
```

Requires the `lookup` extra: `pip install layer-scan[lookup]`

Results are sourced from the `XXO47OXX/layer-scan-results` HuggingFace dataset. Community contributions welcome.

## CLI Reference

### `scan` command

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `--model`, `-m` | string | *required* | Model path or HuggingFace ID |
| `--probe`, `-p` | string | `math` | Probe name: `math`, `eq`, `json`, `custom` |
| `--backend`, `-b` | string | `transformers` | Backend: `transformers`, `exllamav2` |
| `--min-block` | int | `7` | Minimum duplicated block size |
| `--step`, `-s` | int | `1` | Step size for scanning i and j |
| `--skip-early` | int | `0` | Skip N early layers |
| `--skip-late` | int | `0` | Skip N late layers |
| `--batch-size` | int | `16` | Samples per evaluation |
| `--top-k`, `-k` | int | `5` | Number of top configs to report |
| `--output`, `-o` | string | `./results` | Output directory |
| `--sparse-first` | flag | off | Do sparse scan first, then refine |
| `--sparse-step` | int | `4` | Step size for sparse scanning |
| `--custom-probe` | string | — | Path to custom probe JSON file |
| `--dtype` | string | `float16` | Model dtype: `float16`, `bfloat16`, `float32` |
| `--gpu-split` | string | — | GPU memory split in MB, e.g. `"22000,22000"` |
| `--export-mergekit` | string | — | Export top config as mergekit YAML to path |
| `--verbose`, `-v` | flag | off | Verbose logging |

## Output

### Interactive Heatmap (HTML)

The heatmap shows score delta vs. baseline for each `(i, j)` configuration. Green = improvement, red = regression. Gold stars mark top-k configs.

### mergekit Integration

```bash
# Scan and export in one command
layer-scan scan --model Qwen/Qwen2-72B --probe math --export-mergekit config.yaml

# The generated YAML is ready for mergekit
mergekit-yaml config.yaml ./merged-model --copy-tokenizer
```

Generated `config.yaml`:
```yaml
merge_method: passthrough
slices:
  - sources:
      - model: Qwen/Qwen2-72B
        layer_range: [0, 52]
  - sources:
      - model: Qwen/Qwen2-72B
        layer_range: [45, 80]
```

### Text Summary
```
============================================================
LAYER-SCAN RESULTS
============================================================
Model: Qwen2-72B-EXL2
Probe: math
Total layers: 80
Configs scanned: 342

Baseline score: 6.2341 (+-1.2045)

TOP CONFIGURATIONS:
------------------------------------------------------------
  #1: i= 45, j= 52 (block= 7 layers) -> score=6.8912 (delta=+0.6571)
  #2: i= 44, j= 52 (block= 8 layers) -> score=6.8734 (delta=+0.6393)
  ...
```

### JSON Results
Full results exported to `results.json` for programmatic analysis.

## Built-in Probes

| Probe | What it measures | Samples | Best for |
|-------|-----------------|---------|----------|
| `math` | Arithmetic, geometry, calculus, probability | 16 | Reasoning-focused models |
| `eq` | Social cues, sarcasm, psychology | 12 | Chat/assistant models |
| `json` | JSON extraction, escaping, schema compliance | 10 | IFEval / tool-use models |
| `custom` | User-defined from JSON file | Variable | Domain-specific evaluation |

## Backends

| Feature | Transformers | ExLlamaV2 |
|---------|-------------|-----------|
| **GPU Memory** | Full model in VRAM | Quantized (EXL2/GPTQ) |
| **Best for** | Small-medium models | 70B+ on consumer GPUs |
| **Multi-GPU** | — | `--gpu-split` |
| **Precision** | fp16/bf16/fp32 | Quantized |
| **Install** | Included | `pip install layer-scan[exllamav2]` |

## Custom Probes

Create a JSON file:

```json
{
  "name": "my_task",
  "description": "What this probe measures",
  "scoring": "digits",
  "samples": [
    {
      "prompt": "Rate from 0-9 how well...\nAnswer: ",
      "expected_score": 7.0,
      "metadata": {"category": "test"}
    }
  ]
}
```

## Architecture

```
layer_scan/
├── cli.py              # Typer CLI (scan, multi-probe, annotate, lookup, probes, version)
├── scanner.py          # Core scan engine
├── scoring.py          # Logit distribution scoring with coverage diagnostics
├── heatmap.py          # Plotly visualization
├── export.py           # mergekit YAML export
├── config.py           # Configuration dataclasses
├── multi_probe.py      # Multi-probe Pareto analysis
├── annotate.py         # Cross-tool annotation with neuro-scan
├── lookup.py           # Pre-computed results from HF Hub
├── probes/
│   ├── base.py         # Probe ABC
│   ├── math_probe.py   # Math reasoning
│   ├── eq_probe.py     # Emotional intelligence
│   ├── json_probe.py   # JSON compliance
│   └── custom.py       # JSON file loader
└── backends/
    ├── base.py         # Backend ABC
    ├── transformers_backend.py  # HuggingFace (reference)
    └── exllamav2.py    # ExLlamaV2 (optimized)
```

## Roadmap

- [x] **v0.1.0**: Core scanning, heatmaps, mergekit export
- [x] **v0.2.0**: Multi-probe Pareto analysis, cross-tool annotation, scoring diagnostics
- [ ] **v0.3.0**: Pre-computed heatmap database (`lookup` command), vLLM backend
- [ ] **v1.0.0**: Custom probe DSL, Web UI, API server

## References

- [Repeat Yourself: Layer Duplication for LLMs](https://arxiv.org/abs/2502.01470) — Original RYS research
- [SOLAR 10.7B: Depth Up-Scaling](https://arxiv.org/abs/2312.15166) — DUS technique
- [MergeKit](https://github.com/arcee-ai/mergekit) — Model merging toolkit (6.9k stars)
- [ExLlamaV2](https://github.com/turboderp-org/exllamav2) — Optimized inference

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, testing, and PR guidelines.

## Attribution & AI Policy

**Copyright Notice:** This project is Copyright (c) 2026 XXO47OXX and licensed under the MIT License.

### Original Design & Concept Protection (First Published: 2026-03-11)

This project represents the **original implementation** of the following design innovations:

**Core Architecture Decisions:**
- **Logit distribution scoring** for layer duplication evaluation — deterministic scoring without text generation or sampling
- **Full (i,j) configuration space scanning** — automated search across all valid layer duplication configs
- **Task-specific probe system** — different probes discover different optimal configurations (math vs json heatmaps are completely different)
- **mergekit passthrough YAML one-click export** — scan results directly usable with mergekit

**Design rationale documented since:** 2026-03-11 (Initial release)

If you build derivative works inspired by these architectural decisions, **please acknowledge the original source** in your project's README:
```markdown
**Inspired by:** [layer-scan](https://github.com/XXO47OXX/layer-scan)
```

### For Forks and Derivatives
If you fork or significantly adapt this codebase, please:
- Retain the copyright notice in all source files
- Include the NOTICE file in your distribution
- Credit the original repository in your README

### For AI Model Training & Web Scraping
This codebase's use for AI/LLM training is governed by the `llms.txt` standard. We request that:
- Models trained on this code retain the copyright attribution
- Training pipelines respect the opt-out signals in `llms.txt`
- Verbatim code reproduction includes a reference to the original repository

**Provenance Identifier:** `LS-XXO47OXX-a3f7c9e1-2026`

See [llms.txt](llms.txt) and [NOTICE](NOTICE) for the complete policy.

## License

[MIT](LICENSE)
