Metadata-Version: 2.4
Name: gutenberg-sdk
Version: 0.1.0
Summary: Python SDK for the Gutenberg SAE Activation API
Project-URL: Homepage, https://gutenberg.ai
Project-URL: Console, https://console.gutenberg.ai
Project-URL: Documentation, https://console.gutenberg.ai/d/docs
Project-URL: Repository, https://github.com/gutenbergpbc/code
Author: Gutenberg PBC
License: MIT
License-File: LICENSE
Keywords: activations,interpretability,llm,mechanistic-interpretability,observability,sae
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: httpx>=0.28
Requires-Dist: pydantic>=2.0
Requires-Dist: tqdm>=4.66
Provides-Extra: pandas
Requires-Dist: pandas>=2.0; extra == 'pandas'
Requires-Dist: pyarrow>=14; extra == 'pandas'
Description-Content-Type: text/markdown

# gutenberg-sdk

Python SDK for the **Gutenberg** SAE activation API — interpretability-based
observability for language models. Upload text, read it through a sparse
autoencoder (SAE) feature dictionary, and find the features that separate any
two classes of documents.

- **Docs:** https://console.gutenberg.ai/d/docs
- **Console:** https://console.gutenberg.ai
- **Get an API key:** https://console.gutenberg.ai/d/keys

## Install

```bash
uv add gutenberg-sdk          # or: pip install gutenberg-sdk
```

The distribution is `gutenberg-sdk`; the import is `gutenberg`:

```python
from gutenberg import gutenberg
```

Optional `pandas` extra (for `load_activations_df` and parquet helpers):

```bash
uv add "gutenberg-sdk[pandas]"
```

## Quickstart

```python
from gutenberg import gutenberg

client = gutenberg(api_key="gtn_...")   # or set GUTENBERG_API_KEY

# 1. upload a parquet dataset (text + a binary target column)
dataset = client.datasets.upload("examples/simple_binary_features_extraction_100.parquet")

# 2. launch hosted SAE feature extraction
job = client.jobs.create(
    dataset_id=dataset.dataset_id,
    model_id="google/gemma-3-27b-it",
    sae_id="layer_31_width_262k_l0_medium",
)
job = client.jobs.wait(job.job_id)

# 3. score every feature against the target with AUROC
exp = client.experiments.create(
    job_id=job.job_id,
    target_column="is_ai",
    target_column_type="binary",
    positive_value="1",
    scoring_method="auroc",
)
exp = client.experiments.wait(exp.experiment_id)

# 4. read back ranked features and token-level examples
for feature in client.experiments.features(exp.experiment_id)[:10]:
    print(feature.rank, feature.feature_id, feature.score)
```

The full runnable script lives in
[`examples/simple_binary_features_extraction.py`](examples/simple_binary_features_extraction.py),
with a companion 100-row parquet. On production the whole flow runs in a couple
of minutes. See [`docs/getting-started.md`](docs/getting-started.md) for the
walkthrough, SAE selection guidance, and how token examples are served.

> The bundled example is a **curated showcase**, not a benchmark — its 50 AI
> passages were picked to exhibit a handful of recognizable AI-writing features,
> so those features separate the two classes near-perfectly there. Run it on
> your own data to see a realistic ranking.

## API surface

A single `gutenberg(...)` client with namespaced resources: `datasets`,
`jobs`, `experiments`, `aggregations`, `autointerp`, `meta_autointerp`,
`subsets`, plus the sync helpers `activations()`, `interpret()`, `models()`,
and `saes()`.

## License

MIT
