Metadata-Version: 2.4
Name: layer-scan
Version: 0.1.0
Summary: Automated LLM layer duplication configuration scanner with heatmap visualization
Project-URL: Homepage, https://github.com/XXO47OXX/layer-scan
Project-URL: Documentation, https://github.com/XXO47OXX/layer-scan#readme
Project-URL: Issues, https://github.com/XXO47OXX/layer-scan/issues
Project-URL: Changelog, https://github.com/XXO47OXX/layer-scan/blob/main/CHANGELOG.md
Author: XXO47OXX
License-Expression: MIT
License-File: LICENSE
License-File: NOTICE
Keywords: layer-duplication,llm,optimization,rys,transformer
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: numpy>=1.24.0
Requires-Dist: plotly>=5.18.0
Requires-Dist: rich>=13.0.0
Requires-Dist: safetensors>=0.4.0
Requires-Dist: torch>=2.1.0
Requires-Dist: tqdm>=4.65.0
Requires-Dist: transformers>=4.36.0
Requires-Dist: typer>=0.9.0
Provides-Extra: dev
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
Requires-Dist: pytest>=7.4.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: exllamav2
Requires-Dist: exllamav2>=0.1.0; extra == 'exllamav2'
Provides-Extra: vllm
Requires-Dist: vllm>=0.3.0; extra == 'vllm'
Description-Content-Type: text/markdown

<p align="center">
  <h1 align="center">layer-scan</h1>
  <p align="center">
    <strong>Automated LLM layer duplication config scanner — find the optimal (i,j) for any model + task</strong>
  </p>
  <p align="center">
    <a href="https://pypi.org/project/layer-scan/"><img src="https://img.shields.io/pypi/v/layer-scan" alt="PyPI"></a>
    <a href="https://pypi.org/project/layer-scan/"><img src="https://img.shields.io/pypi/pyversions/layer-scan" alt="Python"></a>
    <a href="https://github.com/XXO47OXX/layer-scan/blob/main/LICENSE"><img src="https://img.shields.io/github/license/XXO47OXX/layer-scan" alt="License"></a>
    <a href="https://github.com/XXO47OXX/layer-scan/actions"><img src="https://img.shields.io/github/actions/workflow/status/XXO47OXX/layer-scan/ci.yml" alt="CI"></a>
    <a href="https://codecov.io/gh/XXO47OXX/layer-scan"><img src="https://img.shields.io/codecov/c/github/XXO47OXX/layer-scan" alt="Coverage"></a>
  </p>
  <p align="center">
    <a href="https://python.org/"><img src="https://img.shields.io/badge/Python-3.10+-blue.svg" alt="Python"></a>
    <a href="https://pytorch.org/"><img src="https://img.shields.io/badge/PyTorch-2.1+-ee4c2c.svg" alt="PyTorch"></a>
    <a href="https://huggingface.co/docs/transformers"><img src="https://img.shields.io/badge/HuggingFace-Transformers-yellow.svg" alt="HuggingFace"></a>
    <a href="https://plotly.com/"><img src="https://img.shields.io/badge/Plotly-Heatmaps-3F4F75.svg" alt="Plotly"></a>
    <a href="https://github.com/turboderp-org/exllamav2"><img src="https://img.shields.io/badge/ExLlamaV2-Quantized-green.svg" alt="ExLlamaV2"></a>
  </p>
</p>

---

Given any open-source LLM and an evaluation probe, `layer-scan` finds the optimal layer duplication configuration `(i, j)` that maximizes model capability — **without modifying a single weight**.

## Why layer-scan?

| | Without layer-scan | With layer-scan |
|---|---|---|
| **Process** | Manually test 3,000+ (i,j) configs | One command |
| **Time** | Days of GPU time | Hours (automated) |
| **Output** | Spreadsheet of scores | Interactive heatmap + mergekit YAML |
| **Reproducibility** | Ad-hoc scripts | Deterministic logit scoring |

> The RYS authors manually scanned 3,241 configurations over several days. **layer-scan automates this entire process.**

## Installation

```bash
# pipx (recommended, isolated environment)
pipx install layer-scan

# pip
pip install layer-scan

# For ExLlamaV2 backend (recommended for 70B+ models on consumer GPUs):
pip install layer-scan[exllamav2]
```

## Quick Start

```bash
# Scan with math reasoning probe
layer-scan scan --model Qwen/Qwen2-7B --probe math

# Scan and export mergekit config in one step
layer-scan scan --model Qwen/Qwen2-7B --probe math --export-mergekit config.yaml

# Then merge with mergekit
mergekit-yaml config.yaml ./merged-model
```

### More examples

```bash
# JSON compliance probe (detects IFEval regressions)
layer-scan scan --model Qwen/Qwen2-7B --probe json

# EQ probe (emotional intelligence)
layer-scan scan --model Qwen/Qwen2-7B --probe eq

# ExLlamaV2 for large quantized models
layer-scan scan \
  --model /models/qwen2-72b-exl2 \
  --probe math \
  --backend exllamav2 \
  --gpu-split "22000,22000"

# Custom probe from JSON file
layer-scan scan --model <path> --probe custom --custom-probe my_probe.json

# Sparse scan first, then refine (faster for large models)
layer-scan scan --model <path> --sparse-first --sparse-step 4
```

## How It Works

### Logit Distribution Scoring

Unlike traditional evaluation (generate text -> parse -> score), layer-scan scores directly from the **logit probability distribution**:

```
Restrict to digit tokens [0-9]
-> Softmax over restricted set
-> Expected score = sum(value x probability)
-> Uncertainty = sum((value - expected)^2 x probability)
```

This is:
- **Deterministic** (no sampling variance)
- **Fast** (no autoregressive generation)
- **Information-rich** (uses full distribution, not just argmax)

### Layer Duplication

For configuration `(i=45, j=52)` on an 80-layer model:

```
Standard:    [0, 1, ..., 79]              -> 80 layers
Duplicated:  [0, 1, ..., 51, 45, ..., 79] -> 87 layers
                           ^^^^^^
                     these 7 layers execute twice
```

The model processes the same input through its "reasoning cortex" twice, enhancing depth of analysis.

## CLI Reference

### `scan` command

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `--model`, `-m` | string | *required* | Model path or HuggingFace ID |
| `--probe`, `-p` | string | `math` | Probe name: `math`, `eq`, `json`, `custom` |
| `--backend`, `-b` | string | `transformers` | Backend: `transformers`, `exllamav2` |
| `--min-block` | int | `7` | Minimum duplicated block size |
| `--step`, `-s` | int | `1` | Step size for scanning i and j |
| `--skip-early` | int | `0` | Skip N early layers |
| `--skip-late` | int | `0` | Skip N late layers |
| `--batch-size` | int | `16` | Samples per evaluation |
| `--top-k`, `-k` | int | `5` | Number of top configs to report |
| `--output`, `-o` | string | `./results` | Output directory |
| `--sparse-first` | flag | off | Do sparse scan first, then refine |
| `--sparse-step` | int | `4` | Step size for sparse scanning |
| `--custom-probe` | string | — | Path to custom probe JSON file |
| `--dtype` | string | `float16` | Model dtype: `float16`, `bfloat16`, `float32` |
| `--gpu-split` | string | — | GPU memory split in MB, e.g. `"22000,22000"` |
| `--export-mergekit` | string | — | Export top config as mergekit YAML to path |
| `--verbose`, `-v` | flag | off | Verbose logging |

## Output

### Interactive Heatmap (HTML)

The heatmap shows score delta vs. baseline for each `(i, j)` configuration. Green = improvement, red = regression. Gold stars mark top-k configs.

### mergekit Integration

```bash
# Scan and export in one command
layer-scan scan --model Qwen/Qwen2-72B --probe math --export-mergekit config.yaml

# The generated YAML is ready for mergekit
mergekit-yaml config.yaml ./merged-model --copy-tokenizer
```

Generated `config.yaml`:
```yaml
merge_method: passthrough
slices:
  - sources:
      - model: Qwen/Qwen2-72B
        layer_range: [0, 52]
  - sources:
      - model: Qwen/Qwen2-72B
        layer_range: [45, 80]
```

### Text Summary
```
============================================================
LAYER-SCAN RESULTS
============================================================
Model: Qwen2-72B-EXL2
Probe: math
Total layers: 80
Configs scanned: 342

Baseline score: 6.2341 (+-1.2045)

TOP CONFIGURATIONS:
------------------------------------------------------------
  #1: i= 45, j= 52 (block= 7 layers) -> score=6.8912 (delta=+0.6571)
  #2: i= 44, j= 52 (block= 8 layers) -> score=6.8734 (delta=+0.6393)
  ...
```

### JSON Results
Full results exported to `results.json` for programmatic analysis.

## Built-in Probes

| Probe | What it measures | Samples | Best for |
|-------|-----------------|---------|----------|
| `math` | Arithmetic, geometry, calculus, probability | 16 | Reasoning-focused models |
| `eq` | Social cues, sarcasm, psychology | 12 | Chat/assistant models |
| `json` | JSON extraction, escaping, schema compliance | 10 | IFEval / tool-use models |
| `custom` | User-defined from JSON file | Variable | Domain-specific evaluation |

## Backends

| Feature | Transformers | ExLlamaV2 |
|---------|-------------|-----------|
| **GPU Memory** | Full model in VRAM | Quantized (EXL2/GPTQ) |
| **Best for** | Small-medium models | 70B+ on consumer GPUs |
| **Multi-GPU** | — | `--gpu-split` |
| **Precision** | fp16/bf16/fp32 | Quantized |
| **Install** | Included | `pip install layer-scan[exllamav2]` |

## Custom Probes

Create a JSON file:

```json
{
  "name": "my_task",
  "description": "What this probe measures",
  "scoring": "digits",
  "samples": [
    {
      "prompt": "Rate from 0-9 how well...\nAnswer: ",
      "expected_score": 7.0,
      "metadata": {"category": "test"}
    }
  ]
}
```

## Architecture

```
layer_scan/
├── cli.py              # Typer CLI
├── scanner.py          # Core scan engine
├── scoring.py          # Logit distribution scoring
├── heatmap.py          # Plotly visualization
├── export.py           # mergekit YAML export
├── config.py           # Configuration dataclasses
├── probes/
│   ├── base.py         # Probe ABC
│   ├── math_probe.py   # Math reasoning
│   ├── eq_probe.py     # Emotional intelligence
│   ├── json_probe.py   # JSON compliance
│   └── custom.py       # JSON file loader
└── backends/
    ├── base.py         # Backend ABC
    ├── transformers_backend.py  # HuggingFace (reference)
    └── exllamav2.py    # ExLlamaV2 (optimized)
```

## Roadmap

- **v0.2.0**: Multi-probe cross-analysis, vLLM backend, pre-computed heatmap database
- **v0.3.0**: Sparse sampling acceleration, HuggingFace Hub integration
- **v1.0.0**: Custom probe DSL, Web UI, API server

## References

- [Repeat Yourself: Layer Duplication for LLMs](https://arxiv.org/abs/2502.01470) — Original RYS research
- [SOLAR 10.7B: Depth Up-Scaling](https://arxiv.org/abs/2312.15166) — DUS technique
- [MergeKit](https://github.com/arcee-ai/mergekit) — Model merging toolkit (6.9k stars)
- [ExLlamaV2](https://github.com/turboderp-org/exllamav2) — Optimized inference

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, testing, and PR guidelines.

## Attribution & AI Policy

**Copyright Notice:** This project is Copyright (c) 2026 XXO47OXX and licensed under the MIT License.

### Original Design & Concept Protection (First Published: 2026-03-11)

This project represents the **original implementation** of the following design innovations:

**Core Architecture Decisions:**
- **Logit distribution scoring** for layer duplication evaluation — deterministic scoring without text generation or sampling
- **Full (i,j) configuration space scanning** — automated search across all valid layer duplication configs
- **Task-specific probe system** — different probes discover different optimal configurations (math vs json heatmaps are completely different)
- **mergekit passthrough YAML one-click export** — scan results directly usable with mergekit

**Design rationale documented since:** 2026-03-11 (Initial release)

If you build derivative works inspired by these architectural decisions, **please acknowledge the original source** in your project's README:
```markdown
**Inspired by:** [layer-scan](https://github.com/XXO47OXX/layer-scan)
```

### For Forks and Derivatives
If you fork or significantly adapt this codebase, please:
- Retain the copyright notice in all source files
- Include the NOTICE file in your distribution
- Credit the original repository in your README

### For AI Model Training & Web Scraping
This codebase's use for AI/LLM training is governed by the `llms.txt` standard. We request that:
- Models trained on this code retain the copyright attribution
- Training pipelines respect the opt-out signals in `llms.txt`
- Verbatim code reproduction includes a reference to the original repository

**Provenance Identifier:** `LS-XXO47OXX-a3f7c9e1-2026`

See [llms.txt](llms.txt) and [NOTICE](NOTICE) for the complete policy.

## License

[MIT](LICENSE)
