Metadata-Version: 2.4
Name: compactbench
Version: 0.1.0
Summary: Open benchmark for AI conversation compaction methods
Project-URL: Homepage, https://compactbench.github.io/compactbench
Project-URL: Repository, https://github.com/compactbench/compactbench
Project-URL: Documentation, https://compactbench.github.io/compactbench
Project-URL: Issues, https://github.com/compactbench/compactbench/issues
Project-URL: Changelog, https://github.com/compactbench/compactbench/blob/main/CHANGELOG.md
Author: CompactBench contributors
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: ai,benchmark,compaction,context,drift,evaluation,llm,memory,summarization
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: httpx>=0.27
Requires-Dist: jsonschema>=4.22
Requires-Dist: pydantic-settings>=2.3
Requires-Dist: pydantic>=2.7
Requires-Dist: rich>=13.7
Requires-Dist: ruamel-yaml>=0.18
Requires-Dist: tiktoken>=0.7
Requires-Dist: typer>=0.12
Provides-Extra: dev
Requires-Dist: hypothesis>=6.100; extra == 'dev'
Requires-Dist: pre-commit>=3.7; extra == 'dev'
Requires-Dist: pyright>=1.1.370; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.2; extra == 'dev'
Requires-Dist: ruff>=0.5; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-include-markdown-plugin>=6.2; extra == 'docs'
Requires-Dist: mkdocs-material>=9.5; extra == 'docs'
Provides-Extra: providers
Requires-Dist: google-genai>=0.3; extra == 'providers'
Requires-Dist: groq>=0.9; extra == 'providers'
Requires-Dist: ollama>=0.3; extra == 'providers'
Description-Content-Type: text/markdown

# CompactBench

> Open benchmark for AI conversation compaction methods.

[![PyPI version](https://img.shields.io/pypi/v/compactbench.svg)](https://pypi.org/project/compactbench/)
[![Python](https://img.shields.io/pypi/pyversions/compactbench.svg)](https://pypi.org/project/compactbench/)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)
[![CI](https://github.com/compactbench/compactbench/actions/workflows/ci.yml/badge.svg)](https://github.com/compactbench/compactbench/actions/workflows/ci.yml)

CompactBench measures whether language models still behave correctly after long conversation history is replaced with a compacted representation. It runs adversarial, deterministic, multi-cycle benchmarks and publishes ranked results on a public leaderboard.

- **Deterministic generation** — same template + seed + version always yields the same case
- **Hidden ranked set** — public practice cases for development, hidden templates for ranked scoring
- **Multi-cycle drift** — methods are evaluated across repeated compact → continue → compact loops
- **State-fidelity scoring** — correctness of retained decisions, constraints, and entities, not output style
- **Versioned everywhere** — benchmark suite, template, scorer, model, and method versions are recorded with every result

## Install

```bash
pip install compactbench
```

Or with uv (recommended for development):

```bash
uv pip install compactbench
```

## Quickstart

Run a built-in compactor against the starter suite using a local Ollama model:

```bash
compactbench run \
  --method built-in:hybrid-ledger \
  --suite starter \
  --provider ollama \
  --model llama3.2
```

Generate a single case deterministically for inspection:

```bash
compactbench generate --template buried_constraint_v1 --seed 42
```

Score an existing results file:

```bash
compactbench score --results results.jsonl
```

## Writing your own compactor

Implement the `Compactor` interface and register it.

```python
from compactbench.compactors import Compactor
from compactbench.contracts import CompactionArtifact, Transcript

class MyCompactor(Compactor):
    name = "my-method"
    version = "0.1.0"

    def compact(self, transcript: Transcript, config: dict) -> CompactionArtifact:
        ...
```

Then run:

```bash
compactbench run --method path/to/my_compactor.py:MyCompactor --suite elite_practice
```

See [docs/writing-a-compactor.md](docs/writing-a-compactor.md) for full details.

## Leaderboard

The public leaderboard is at **[https://compactbench.github.io/compactbench/leaderboard](https://compactbench.github.io/compactbench/leaderboard/)**.

Submissions are evaluated against **hidden** ranked benchmark cases by a maintainer-operated runner. To submit:

1. Write and test your compactor locally against `elite_practice`.
2. Open a PR to [`submissions/`](submissions/) with your method source and config.
3. A maintainer runs it against the hidden set and merges if it qualifies.

See [docs/submitting.md](docs/submitting.md) for the full submission protocol.

## Project status

**v0.1.0 launch-ready.** All ten workorders from the implementation roadmap have landed:

- Core: DSL parser, case generation, scoring engine, mock + real providers (Groq / Google AI Studio / Ollama)
- Methods: four built-in compactors (`naive-summary`, `structured-state`, `hierarchical-summary`, `hybrid-ledger`)
- Runtime: end-to-end `compactbench run` with drift cycles, JSONL event log, `--resume`
- Leaderboard: PR-based submission workflow on GitHub-hosted runners, static site fed by a qualification + ranking core
- Content: 15 public Elite practice templates + 15 hidden ranked templates across three launch families
- Release: PyPI trusted-publishing workflow wired up; tag `v0.1.0` to ship

See [CHANGELOG.md](CHANGELOG.md) for the full breakdown. Post-launch work (hidden-set content expansion, additional template families, shadow evaluation automation, custom domain) is tracked via GitHub issues.

## Contributing

Bug reports, template proposals, and new compactors are welcome. See [CONTRIBUTING.md](CONTRIBUTING.md).

Please also read our [Code of Conduct](CODE_OF_CONDUCT.md).

## License

Apache License 2.0 — see [LICENSE](LICENSE).
