Metadata-Version: 2.4
Name: openinterp
Version: 0.3.0
Summary: Python SDK + CLI for openinterp.org — Atlas search, SAE Traces, FabricationGuard hallucination detection, AgentProbeGuard mid-reasoning gate for code agents, InterpScore. The operational layer for mechanistic interpretability.
Project-URL: Homepage, https://openinterp.org
Project-URL: Repository, https://github.com/OpenInterpretability/cli
Project-URL: Documentation, https://openinterp.org/docs
Project-URL: Bug Tracker, https://github.com/OpenInterpretability/cli/issues
Project-URL: Changelog, https://github.com/OpenInterpretability/cli/blob/main/CHANGELOG.md
Author-email: Caio Vicentino <hi@openinterp.org>
License: Apache-2.0
License-File: LICENSE
Keywords: LLM,ai safety,circuit,fabrication,guard,hallucination,interpretability,linear probe,mechanistic interpretability,sae,sparse autoencoder,trace,transformer
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: click>=8.1
Requires-Dist: huggingface-hub>=0.27
Requires-Dist: pydantic>=2.0
Requires-Dist: requests>=2.31
Requires-Dist: rich>=13.0
Provides-Extra: dev
Requires-Dist: build>=1.0; extra == 'dev'
Requires-Dist: pytest-cov>=4; extra == 'dev'
Requires-Dist: pytest>=7; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Requires-Dist: twine>=5.0; extra == 'dev'
Provides-Extra: full
Requires-Dist: accelerate>=1.0; extra == 'full'
Requires-Dist: joblib>=1.3; extra == 'full'
Requires-Dist: numpy>=1.26; extra == 'full'
Requires-Dist: safetensors>=0.4; extra == 'full'
Requires-Dist: scikit-learn>=1.3; extra == 'full'
Requires-Dist: torch>=2.3; extra == 'full'
Requires-Dist: transformers>=4.55; extra == 'full'
Description-Content-Type: text/markdown

<div align="center">

# `openinterp`

### Python SDK + CLI for [openinterp.org](https://openinterp.org)

Search the feature Atlas, generate Traces from your own SAE, rank against the public InterpScore leaderboard.

[![PyPI](https://img.shields.io/pypi/v/openinterp.svg?color=8b5cf6)](https://pypi.org/project/openinterp/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue)](https://www.python.org/)
[![License Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-green)](./LICENSE)
[![openinterp.org](https://img.shields.io/badge/site-openinterp.org-8b5cf6)](https://openinterp.org)
[![Discussions](https://img.shields.io/github/discussions/OpenInterpretability/cli)](https://github.com/OpenInterpretability/cli/discussions)

</div>

---

## Install

```bash
pip install openinterp              # lite: Atlas + CLI (no torch, ~2 MB total)
pip install "openinterp[full]"      # + torch/transformers/safetensors for trace generation
```

Requires **Python ≥ 3.10**.

---

## Part of a 5-repo ecosystem

| Repo | What's in it |
|---|---|
| [`.github`](https://github.com/OpenInterpretability/.github) | Org profile + shared CoC + SECURITY |
| [`web`](https://github.com/OpenInterpretability/web) | Next.js site behind openinterp.org |
| [`notebooks`](https://github.com/OpenInterpretability/notebooks) | 23 training + interpretability notebooks |
| **`cli`** (you are here) | `pip install openinterp` — Python SDK |
| [`mechreward`](https://github.com/OpenInterpretability/mechreward) | SAE features as dense RL reward |

---

## 🚀 Quick start

### Search the Atlas (offline, zero GPU)

```bash
$ openinterp atlas "overconfidence"
```

```
                    Atlas results: 'overconfidence'
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━
┃ ID      ┃ Name                    ┃ Model             ┃ AUROC ┃ Description
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━
│ f2503   │ overconfidence_pattern  │ Qwen/Qwen3.6-27B  │  0.54 │ Definitive…
│ f1847   │ urgency_assessment      │ Qwen/Qwen3.6-27B  │  0.68 │ Time-critic…
└─────────┴─────────────────────────┴───────────────────┴───────┴────────────
```

```python
>>> from openinterp import search_features
>>> features = search_features("overconfidence", model="Qwen/Qwen3.6-27B")
>>> features[0].id
'f2503'
```

### Generate a Trace from your own SAE

```bash
pip install "openinterp[full]"

openinterp trace \
    --model google/gemma-2-2b \
    --sae-repo YOUR_HF_USER/gemma2-2b-sae-first \
    --prompt "The capital of France is" \
    --layer 12 \
    --d-model 2304 --d-sae 16384 --k 64 \
    --out my_trace.json
```

This:
1. Loads the base model in bf16 with SDPA (no flash-attn)
2. Loads your SAE from HuggingFace (sae_lens `safetensors` format)
3. Generates tokens, captures residuals at layer 12
4. Applies the SAE, picks top-10 active features
5. Writes a `Trace` JSON matching [openinterp.org/observatory/trace](https://openinterp.org/observatory/trace) byte-for-byte

### Python API

```python
from openinterp import generate_trace

trace = generate_trace(
    model_id="google/gemma-2-2b",
    sae_repo="YOUR_HF_USER/gemma2-2b-sae-first",
    prompt="The capital of France is",
    layer=12,
    d_model=2304,
    d_sae=16384,
    k=64,
)

print(trace.model_dump_json(indent=2))   # Trace Theater schema
```

### With feature labels from notebook 04

```bash
# After running 04_discover_features.ipynb (emits feature_catalog.json):
openinterp trace ... --catalog feature_catalog.json
```

Trace features inherit names from your catalog.

---

## 🛡️ FabricationGuard (v0.2.0+)

[![FabricationGuard headline](https://huggingface.co/datasets/caiovicentino1/FabricationGuard-linearprobe-qwen36-27b/resolve/main/chart_hero.png)](https://huggingface.co/datasets/caiovicentino1/FabricationGuard-linearprobe-qwen36-27b)

Production hallucination probe on Qwen3.6-27B. AUROC 0.88 cross-task on SimpleQA, **−88% confident-wrong reduction** in mitigation mode, ~1ms scoring latency.

```python
from openinterp import FabricationGuard

guard = FabricationGuard.from_pretrained("Qwen/Qwen3.6-27B")
output = guard.generate("Who won the 2003 Nobel Prize in Aerodynamics?", mode="abstain")
# → "I don't have reliable information to answer this confidently."
```

**Methodology lineage**: extends [Anthropic's persona-vectors approach](https://arxiv.org/abs/2507.21509) (Aug 2025, tested on 7-8B) to Qwen3.6-27B (3-4× larger) with formal cross-task AUROC + bootstrap CIs + mitigation-rate evaluation. Apache-2.0 production-grade implementation, not a proprietary platform. Probe artifact: [`caiovicentino1/FabricationGuard-linearprobe-qwen36-27b`](https://huggingface.co/datasets/caiovicentino1/FabricationGuard-linearprobe-qwen36-27b). Live demo: [openinterp.org/products/fabricationguard](https://openinterp.org/products/fabricationguard).

## 🧠 ReasonGuard v0.1 (in registry)

[![ReasonGuard headline](https://huggingface.co/datasets/caiovicentino1/ReasoningGuard-linearprobe-qwen36-27b/resolve/main/reasoningguard_headline.png)](https://huggingface.co/datasets/caiovicentino1/ReasoningGuard-linearprobe-qwen36-27b)

Reasoning-faithfulness probe at **L55 / mid_think** on Qwen3.6-27B in thinking mode. Detects wrong-answer trajectories *during* the `<think>` block. **Honest narrow scope**: AUROC 0.888 within math reasoning (GSM8K), 0.605 cross-domain to commonsense (StrategyQA) — domain-bound, not generalized.

**Layer × position interaction (novel)**: shallow layers (L31) favor `end_question`; deep layers (L55) favor `mid_think`. Position-of-faithfulness migrates with depth.

```python
from openinterp import probebench

probe = probebench.load("openinterp/reasonguard-qwen36-27b-l55-mid_think")
score = probe.score(activations)  # P(wrong-answer trajectory)
```

Both numbers (within + cross) registered honestly per ProbeBench's anti-Goodhart norms. Probe artifact: [`caiovicentino1/ReasoningGuard-linearprobe-qwen36-27b`](https://huggingface.co/datasets/caiovicentino1/ReasoningGuard-linearprobe-qwen36-27b). Live on [openinterp.org/probebench](https://openinterp.org/probebench/probe/openinterp%2Freasonguard-qwen36-27b-l55-mid_think).

## 🧬 ProbeBench (v0.2.0+)

The first categorical leaderboard for activation probes — 8 categories, 7-axis ProbeScore, anti-Goodhart by construction.

```python
from openinterp import probebench

probes = probebench.list_probes(category="hallucination")
probe  = probebench.load("openinterp/fabricationguard-qwen36-27b-l31-v2")
score  = probe.score(activations)
```

```bash
openinterp probebench list                       # show all registered probes
openinterp probebench load <probe-id>            # download + verify SHA-256
openinterp probebench validate ./my-probe/       # check artifact spec
openinterp probebench reproduce <probe-id>       # download reproducer notebook
```

Browse the leaderboard: [openinterp.org/probebench](https://openinterp.org/probebench).

---

## 🔧 v0.2.1 — `safe_load_qwen36_lora()`

Encapsulates the Qwen3.6 PEFT-save `.language_model.` infix bug discovered during paper-2 grokking work (April 2026). Saved Qwen3.6 LoRA adapters carry an extra `.language_model.` infix in state-dict keys; `PeftModel.from_pretrained()` against a reloaded dense Qwen3.6 silently fails — adapter loaded, max logit-diff = `0.000`, no error raised.

```python
from openinterp import safe_load_qwen36_lora

model = safe_load_qwen36_lora(
    base_model_id="Qwen/Qwen3.6-27B",
    adapter_path="path/to/checkpoint-200",
)  # auto strip .language_model. + auto verify logit-diff > 0.01
```

Also exposed: `strip_language_model_infix()`, `verify_adapter_loaded()`, `LoRAVerificationError`. This bug invalidated ~10 hours of prior eval work before being caught — anyone working with Qwen3.6 LoRA save/reload pipelines should run the sanity check.

---

## 📦 What's in v0.2.x

| Command | Status | What it does |
|---|---|---|
| `openinterp atlas <query>` | ✅ live | Feature search with offline fallback to curated demo features |
| `openinterp trace ...` | ✅ live (needs `[full]`) | Real SAE trace generation, sae_lens format, any HF model |
| `openinterp guard ...` | ✅ live | FabricationGuard scoring + abstain mode on Qwen3.6-27B |
| `openinterp probebench {list,load,score,validate,reproduce,submit}` | ✅ live | ProbeBench v0.0.1 SDK |
| `openinterp.lora.safe_load_qwen36_lora` | ✅ live (v0.2.1) | Safe Qwen3.6 LoRA loader with strip + verify |
| `openinterp info` | ✅ live | Version + optional-dep status |

### Planned v0.3.0

- `openinterp upload-trace <trace.json>` → shareable openinterp.org URL
- `openinterp score --sae-repo X` → compute InterpScore (wraps [notebook 18](https://github.com/OpenInterpretability/notebooks/blob/main/notebooks/18_interpscore_eval.ipynb))
- `openinterp steer --sae-repo X --feature Y --alpha Z` → intervention (wraps [notebook 06](https://github.com/OpenInterpretability/notebooks/blob/main/notebooks/06_steer_your_model.ipynb))
- `openinterp circuit --sae-repo X --prompt Y` → attribution graph JSON (wraps [notebook 14/15](https://github.com/OpenInterpretability/notebooks/))
- `openinterp publish <repo>` → HuggingFace release with model card
- ReasonGuard v0.2 — multi-bench training (math + commonsense) to fix cross-domain transfer

Open an issue on the [tracker](https://github.com/OpenInterpretability/cli/issues) if you'd take one of these.

---

## 🛠️ Development

```bash
git clone https://github.com/OpenInterpretability/cli openinterp-cli
cd openinterp-cli
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev,full]"          # dev = pytest + ruff + build; full = torch + transformers
pytest -xvs                            # 5 tests, ~1s
```

### Package layout

```
openinterp-cli/
├── pyproject.toml              # name='openinterp', hatchling build
├── openinterp/
│   ├── __init__.py             # public exports + __version__
│   ├── models.py               # pydantic types: AtlasFeature, Trace, TraceFeature
│   ├── atlas.py                # search_features() — HF API + curated fallback
│   ├── trace.py                # generate_trace() — real transformers-based impl
│   └── cli.py                  # click-based CLI: atlas / trace / info
├── tests/
│   ├── test_atlas.py
│   └── test_trace.py
├── CHANGELOG.md
├── CONTRIBUTING.md
└── README.md
```

### Contribution recipe — add a new command

> Full rules: [CONTRIBUTING.md](./CONTRIBUTING.md).

1. Decide which notebook it wraps (score → 18, steer → 06, circuit → 14/15, publish → generic)
2. Add a function to the matching file (`openinterp/score.py`, etc.). Keep it small — actual compute lives in the notebook.
3. Expose it in `__init__.py`
4. Add a `@main.command()` in `cli.py` with click decorators
5. Add a smoke test in `tests/test_<name>.py`
6. Update `CHANGELOG.md` under `[Unreleased]`
7. PR title: `Add openinterp <command>`

**Hard rules**:
- Python ≥ 3.10 syntax (PEP 604 unions OK)
- `dtype=torch.bfloat16`, never `torch_dtype=` (transformers 5.x deprecated)
- SDPA only, never flash-attn
- New heavy deps (`torch` tier) → add to `[full]` extra, not base
- Every new public function has type hints + docstring

---

## 🚢 Release process (maintainer)

```bash
# 1. Bump version in BOTH:
#    pyproject.toml          ([project] version = "X.Y.Z")
#    openinterp/__init__.py  (__version__ = "X.Y.Z")
# 2. Update CHANGELOG.md — move [Unreleased] → [X.Y.Z] — YYYY-MM-DD

source .venv/bin/activate
rm -rf dist/
python -m build
python -m twine check dist/*
python -m twine upload dist/*     # needs PyPI token in ~/.pypirc

git tag vX.Y.Z
git push --tags
```

---

## CI

Every PR runs:
- `pytest -xvs` across Python 3.10, 3.11, 3.12 (see `.github/workflows/ci.yml`)
- `ruff check .` (warn-only for now)
- `python -m build` + `twine check`

Green required to merge.

---

## Community

- 💬 [Discussions](https://github.com/OpenInterpretability/cli/discussions) — API proposals, "which repo should this live in"
- 🟢 [Good-first-issues](https://github.com/OpenInterpretability/cli/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22)
- 📦 [PyPI release history](https://pypi.org/project/openinterp/#history)
- ✉️ hi@openinterp.org

---

## Standing on the shoulders of

- [Neuronpedia](https://neuronpedia.org) · the SAE encyclopedia
- [Gemma Scope](https://huggingface.co/google/gemma-scope) · reference SAE suite
- [Gao et al. 2024](https://arxiv.org/abs/2406.04093) · TopK + AuxK recipe
- [SAELens](https://github.com/jbloomAus/SAELens) · our safetensors format

---

## License

**Apache-2.0.** Built by [Caio Vicentino](https://huggingface.co/caiovicentino1) + OpenInterpretability. 2026.

[openinterp.org](https://openinterp.org) · [github.com/OpenInterpretability](https://github.com/OpenInterpretability) · [hi@openinterp.org](mailto:hi@openinterp.org)
