Metadata-Version: 2.4
Name: pysteer-adaptation
Version: 0.1.1
Summary: Python activation-steering library for PyTorch and Hugging Face-style language models.
Author-email: Mattia Piazzalunga <mattiapiazzalunga@outlook.com>
Maintainer-email: Mattia Piazzalunga <mattiapiazzalunga@outlook.com>
License-Expression: MPL-2.0
Project-URL: Homepage, https://github.com/mattiapiazzalunga/pysteer
Project-URL: Documentation, https://mattiapiazzalunga.github.io/pysteer/
Project-URL: Issues, https://github.com/mattiapiazzalunga/pysteer/issues
Project-URL: Source, https://github.com/mattiapiazzalunga/pysteer
Project-URL: Changelog, https://github.com/mattiapiazzalunga/pysteer/blob/master/CHANGELOG.md
Keywords: activation-engineering,activation-steering,ai-safety,huggingface,interpretability,large-language-models,language-models,llm,mechanistic-interpretability,model-steering,pytorch,representation-engineering,transformers
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: torch<3,>=2.1
Requires-Dist: pandas<3,>=2.0
Requires-Dist: accelerate<2,>=0.26
Requires-Dist: tqdm<5,>=4.66
Requires-Dist: transformers<5,>=4.40
Provides-Extra: test
Requires-Dist: pytest>=8.3; extra == "test"
Requires-Dist: pytest-cov>=5.0; extra == "test"
Provides-Extra: docs
Requires-Dist: sphinx>=8.2; extra == "docs"
Requires-Dist: myst-parser>=4.0; extra == "docs"
Provides-Extra: dev
Requires-Dist: build>=1.2; extra == "dev"
Requires-Dist: pytest>=8.3; extra == "dev"
Requires-Dist: pytest-cov>=5.0; extra == "dev"
Requires-Dist: ruff>=0.8; extra == "dev"
Requires-Dist: twine>=5.1; extra == "dev"
Dynamic: license-file

<p align="center">
  <img src="https://raw.githubusercontent.com/mattiapiazzalunga/pysteer/master/images/logo.png" alt="pysteer Python activation steering library logo" width="130">
</p>
<p align="center"><em>Python activation steering for LLMs and transformer language models.</em></p>

<p align="center">
  <a href="https://pypi.org/project/pysteer/">
    <img src="https://img.shields.io/pypi/v/pysteer?label=PyPI&logo=pypi" alt="pysteer package on PyPI"/>
  </a>
  <a href="https://pypi.org/project/pysteer/">
    <img src="https://img.shields.io/pypi/pyversions/pysteer?logo=python&logoColor=white" alt="pysteer supported Python versions"/>
  </a>
  <a href="https://opensource.org/licenses/MPL-2.0">
    <img src="https://img.shields.io/badge/License-MPL%202.0-brightgreen.svg" alt="MPL 2.0 license"/>
  </a>
  <a href="https://github.com/mattiapiazzalunga/pysteer/actions">
    <img src="https://img.shields.io/github/actions/workflow/status/mattiapiazzalunga/pysteer/ci.yml?label=CI&logo=github" alt="pysteer continuous integration status"/>
  </a>
  <a href="https://github.com/mattiapiazzalunga/pysteer/issues">
    <img src="https://img.shields.io/github/issues/mattiapiazzalunga/pysteer?logo=github" alt="pysteer GitHub issues"/>
  </a>
</p>

# pysteer: Python Activation Steering for LLMs

`pysteer` is a lightweight Python library for activation steering,
representation engineering, and inference-time model steering in PyTorch
transformer language models. It learns steering artifacts from labeled
prompt/response examples, then applies interventions to intermediate
activations without fine-tuning or modifying model weights.

The package is designed for researchers and developers working on LLM control,
mechanistic interpretability, AI safety experiments, and activation engineering
workflows with Hugging Face-style models.

- PyPI package: <https://pypi.org/project/pysteer/>
- Documentation: <https://mattiapiazzalunga.github.io/pysteer/>
- Source code: <https://github.com/mattiapiazzalunga/pysteer>
- Issues: <https://github.com/mattiapiazzalunga/pysteer/issues>

## Why Use pysteer

- Steer LLM behavior at inference time without retraining the model.
- Compare multiple activation-steering methods behind one `Executor` API.
- Build prompt-routed, adaptive, or gradient-derived steering workflows.
- Extend the steering engine with custom derivation and runtime strategies.
- Keep activation hooks scoped with a context-managed runtime wrapper.

## Features

- Training-time activation extraction from selected transformer layers.
- Built-in steering methods: CMD, CPCA, ACTS-CMD, ACTS-CPCA, MBS-CMD,
  Angular Steering, Adaptive Activation Steering, COLD-Kernel, and COLD-Steer.
- A registry-based extension layer for adding new derivation/runtime methods
  without editing `Executor`.
- A context-managed runtime wrapper that keeps steering hooks scoped to the
  calls where they are intended.
- Sphinx documentation with autodoc, Napoleon docstrings, API reference pages,
  and an `open` target.

## Use Cases

- LLM activation steering and behavior control from labeled examples.
- Representation engineering experiments on residual stream activations.
- Mechanistic interpretability prototypes that compare steering directions.
- Inference-time intervention workflows where model weights should stay frozen.
- Custom activation-engineering methods for PyTorch transformer models.

## Installation

Install from PyPI:

```bash
python -m pip install pysteer
```

Install from a local checkout for development:

```bash
python -m pip install -e ".[dev,docs]"
```

Install only the runtime dependencies when working from source without an
editable install:

```bash
python -m pip install -r REQUIREMENTS.txt
```

Install documentation dependencies only when building the docs:

```bash
python -m pip install -r docs/requirements.txt
```

## Minimal Example

The core entry point is `pysteer.Executor`. Training data uses `prompt`,
`response`, and `reference` columns, where `reference` identifies the desired
or positive response class for contrastive steering methods.

```python
import pandas as pd

from pysteer import Executor

train_df = pd.DataFrame(
    [
        {"prompt": "Question", "response": "Helpful answer", "reference": 1},
        {"prompt": "Question", "response": "Unhelpful answer", "reference": 0},
    ]
)

executor = Executor(
    model=model,
    tokenizer=tokenizer,
    train_df=train_df,
    method="cmd",
    layers_to_extract=[12, 16, 20],
    alpha=0.5,
)

wrapper = executor.representation_extractor()

with wrapper as steered_model:
    output = steered_model.generate(**inputs, max_new_tokens=64)
```

Built-in unsupervised methods expect `prompt`, `response`, and `reference`.
Routed methods add their own grouping columns, such as `task_id`, `mbs_layer`,
or ACT grouping identifiers.

Training rows are validated before hooks are attached. `reference` must contain
only `0` and `1`, and every contrastive training scope needs at least one
positive and one negative row. For standard methods the scope is the full
dataframe; ACTS validates each integer-like `task_id`; MBS-CMD validates each
selected `mbs_layer`; ACT validates each normalized ACT group.

## Supported Steering Methods

`pysteer` ships with a default registry of activation-steering methods:

- `cmd`: Contrastive Mean Difference steering vectors.
- `cpca`: Contrastive PCA steering directions.
- `acts_cmd`: ACTS prompt-routed CMD steering.
- `acts_cpca`: ACTS prompt-routed CPCA steering.
- `mbs_cmd`: layer-balanced CMD steering.
- `angular`: Angular Steering with plane rotations.
- `act`: Adaptive Activation Steering with prompt clustering and probes.
- `cold_kernel`: gradient-derived COLD-Kernel steering directions.
- `cold_steer`: inference-efficient COLD-Steer alias.

## Architecture

The library separates steering into four concerns:

- Derivation: how an artifact is learned from activations.
- Artifact: the vector, plane, routing table, probe, or richer object produced.
- Site: where the artifact reads or writes model state.
- Runtime policy: when and how the intervention is applied.

The `steering_engine` package contains the extension API:

- `domain.py` defines declarative data structures such as `ActivationSite`,
  `InterventionSpec`, `SteeringArtifact`, and `SteeringMethodSpec`.
- `components.py` defines protocols for readers, derivers, runtime strategies,
  schedules, controllers, and compilers.
- `registry.py` provides `SteeringMethodRegistry` and `MethodDefinition`.
- `defaults.py` registers the built-in methods.

See `docs/activation_steering_architecture.md` for the design rationale and
taxonomy.

## Extending Methods

Register a new method with a vector factory and a runtime strategy builder:

```python
from steering_engine import MethodDefinition, SteeringMethodRegistry
from steering_engine.domain import DerivationFamily, InterventionKind
from steering_engine.domain import RuntimeFamily, SteeringMethodSpec

registry = SteeringMethodRegistry()
registry.register(
    MethodDefinition(
        spec=SteeringMethodSpec(
            method_id="my_method",
            label="My Method",
            derivation_family=DerivationFamily.CUSTOM,
            runtime_family=RuntimeFamily.STATIC,
            intervention_kind=InterventionKind.ADD,
        ),
        vector_factory=lambda ctx: MyVectorDeriver(...),
        strategy_builder=lambda deriver, ctx: MyRuntimeStrategy(...),
    )
)
```

## Documentation

Build the Sphinx HTML documentation:

```bash
make -C docs html
```

Build and open it in your default browser:

```bash
make -C docs open
```

On Windows without `make`:

```powershell
docs\make.bat html
docs\make.bat open
```

The generated site is written to `docs/_build/html/index.html`.

## Contributing

See `CONTRIBUTING.md` for development setup, local checks, and the preferred
extension path for new steering methods. Security reports should follow
`SECURITY.md`.

## Evaluation Data

`pysteer` focuses on the generic steering engine and expects callers to provide
their own training dataframes for application-specific evaluations.

## License

This project is licensed under the Mozilla Public License 2.0. See
`LICENSE.txt`.
