Metadata-Version: 2.4
Name: kiji-inspector
Version: 0.5.0rc0
Summary: Kiji Inspector
Author-email: 575 Lab - Dataiku's Open Source Office <opensource@dataiku.com>
License: Apache 2.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: <3.14,>=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.10.0
Requires-Dist: torchvision>=0.25.0
Requires-Dist: numpy>=1.26.0
Requires-Dist: tqdm>=4.66.0
Requires-Dist: pyarrow>=15.0.0
Requires-Dist: rich>=13.0.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: scipy>=1.11.0
Requires-Dist: scikit-learn>=1.3.0
Requires-Dist: datasets>=3.0.0
Requires-Dist: huggingface-hub>=1.13.0
Provides-Extra: full
Requires-Dist: torch>=2.11.0; extra == "full"
Requires-Dist: torchvision>=0.26.0; extra == "full"
Requires-Dist: transformers>=5.0.0; extra == "full"
Requires-Dist: accelerate>=1.13.0; extra == "full"
Requires-Dist: vllm==0.20.1; extra == "full"
Dynamic: license-file


# Kiji Inspector: Mechanistic Interpretability for AI Agent Tool Selection

<div align="center">
  <img src="https://raw.githubusercontent.com/dataiku/kiji-inspector/main/static/kiji_inspector_inverted.png" alt="Kiji Inspector" width="300">

  <p>
    <a href="https://github.com/dataiku/kiji-inspector/actions/workflows/ci-core.yml"><img src="https://github.com/dataiku/kiji-inspector/actions/workflows/ci-core.yml/badge.svg" alt="CI Core"></a>
    <a href="https://github.com/dataiku/kiji-inspector/actions/workflows/ci-extras.yml"><img src="https://github.com/dataiku/kiji-inspector/actions/workflows/ci-extras.yml/badge.svg" alt="CI Extras"></a>
    <a href="LICENSE"><img src="https://img.shields.io/badge/license-Apache%20License%202.0-blue" alt="License: Apache 2.0"></a>
    <a href="https://github.com/dataiku/kiji-inspector/stargazers"><img src="https://img.shields.io/github/stars/dataiku/kiji-inspector?style=social" alt="GitHub Stars"></a>
    <a href="https://github.com/dataiku/kiji-inspector/issues"><img src="https://img.shields.io/github/issues/dataiku/kiji-inspector" alt="GitHub Issues"></a>
  </p>

  <p>
    <img src="https://img.shields.io/badge/python-%3E%3D3.10-3776AB?logo=python&logoColor=white" alt="Python Version">
  </p>

  <p>
    <img src="https://img.shields.io/badge/LLMs-responsible-blue" alt="Responsible AI">
    <img src="https://img.shields.io/badge/contributions-welcome-brightgreen" alt="Contributions Welcome">
    <img src="https://img.shields.io/badge/PRs-welcome-brightgreen" alt="PRs Welcome">
  </p>
</div>

## Status
This project is **under heavy active development**. We are planning to release a stable version of the framework in the coming weeks.

In the meantime, join our [Slack Community](https://join.slack.com/t/dataiku-opensource/shared_invite/zt-3o6yq14rp-FTtAHZYhyru~jLZ~S6xPLA)

Learn more about our approach and early results:

* [Paper](paper/Opening%20the%20Black%20Box%20Mechanistic%20Interpretability%20of%20Agent%20Tool%20Selection%20with%20Sparse%20Autoencoders.pdf)
* [Presentation](presentation/Opening%20the%20Black%20Box%20Mechanistic%20Interpretability%20of%20Agent%20Tool%20Selection%20with%20Sparse%20Autoencoders.pdf)

---

## What This Project Does

This project trains **Sparse Autoencoders (SAEs)** on the internal activations of an AI agent to understand *why* it selects specific tools. Given a user request like "Search our docs for API limits," the agent must choose between tools (e.g., `internal_search` vs `web_search`). We extract the model's hidden representations at the moment of that decision, decompose them into interpretable features using a JumpReLU SAE, and validate the resulting explanations through automated fuzzing and causal ablation experiments.

The key insight: train the SAE on **raw activations** (not difference vectors), then use **contrastive pairs** post-hoc to identify which learned features correspond to specific tool-selection decisions. This preserves the SAE's general feature dictionary while enabling targeted analysis of decision-relevant features.

## Install

For loading and running pretrained SAEs:

```bash
pip install kiji-inspector
```

For the full extraction, training, and analysis workflow:

```bash
pip install 'kiji-inspector[train]'
```

`kiji-inspector[full]` is also available as an alias for the same full stack.

## Quick Start

```python
from kiji_inspector import SAE

sae, feature_descriptions = SAE.from_pretrained(
    base_model="nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16",
    layer=20,
)

features = sae.encode(activations)
reconstruction = sae.decode(features)
```

Training and data-generation entrypoints live under the package namespace:

```bash
python -m kiji_inspector.generate_pairs 1300
python -m kiji_inspector.pipeline --layers 10 20 30
```

## Local vLLM patches

For local experiments that require the custom `vllm` extraction changes, rebuild the environment and apply the patch set from the repository root:

```bash
uv sync --no-cache --refresh --extra full --group dev
./patches/apply-patch.sh
```

The apply script installs every `*.patch` file under [patches](patches/) in lexical order:

- `01_allow_extract_hidden_states.patch`
- `02_support_nemotron_models.patch`
- `03_support_gemma3_models.patch`

Additional workflow details live in [patches/README_PATCH.md](patches/README_PATCH.md).

---

## 🤝 Contributing

We welcome contributions! Whether you're fixing a bug, improving documentation, or proposing a new feature, your help is appreciated.

### Ways to Contribute

- **Report Bugs** - [Open an issue](https://github.com/dataiku/kiji-inspector/issues) with steps to reproduce
- **Improve Docs** - Documentation PRs are always welcome
- **Submit Features** - Open an issue to discuss your idea before submitting a PR
- **Share Feedback** - [Start a discussion](https://github.com/dataiku/kiji-inspector/discussions)

### Community

- **Slack** - [Join our community](https://join.slack.com/t/dataiku-opensource/shared_invite/zt-3o6yq14rp-FTtAHZYhyru~jLZ~S6xPLA) to ask questions and connect with other contributors
- **Contributors** - See [CONTRIBUTORS.md](CONTRIBUTORS.md) for the list of people who have contributed

---

## 📄 License

Copyright (c) 2026 Dataiku SAS

This project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.
