Metadata-Version: 2.4
Name: tokenpress
Version: 0.1.1
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Dist: numpy>=1.21
Requires-Dist: tokenpress[models,openai,anthropic] ; extra == 'all'
Requires-Dist: anthropic>=0.20 ; extra == 'anthropic'
Requires-Dist: transformers>=4.30 ; extra == 'models'
Requires-Dist: torch>=2.0 ; extra == 'models'
Requires-Dist: openai>=1.0 ; extra == 'openai'
Provides-Extra: all
Provides-Extra: anthropic
Provides-Extra: models
Provides-Extra: openai
License-File: LICENSE
Summary: Blazing-fast LLM token compression engine. Built in Rust. Reduce API costs 2-5x with mathematically provable zero quality loss.
License: Apache-2.0
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/zamalali/tokenpress
Project-URL: Issues, https://github.com/zamalali/tokenpress/issues
Project-URL: Repository, https://github.com/zamalali/tokenpress

<p align="center">
  <img src="https://raw.githubusercontent.com/zamalali/tokenpress/master/logo.png" alt="TokenPress" width="240">
</p>

<h1 align="center">TokenPress</h1>

<p align="center">
  <strong>Blazing-fast LLM token compression engine, built in Rust.</strong><br>
  2–5× fewer tokens. Same or better output quality. Any model, any language.
</p>

<p align="center">
  <a href="https://pypi.org/project/tokenpress/"><img alt="PyPI" src="https://img.shields.io/badge/pypi-v0.1.1-blue?style=flat-square&logo=pypi&logoColor=white"></a>
  <a href="https://pypi.org/project/tokenpress/"><img alt="Downloads" src="https://img.shields.io/badge/downloads-new-blue?style=flat-square&logo=pypi&logoColor=white"></a>
  <a href="https://github.com/zamalali/tokenpress/blob/master/LICENSE"><img alt="License" src="https://img.shields.io/badge/license-Apache--2.0-green?style=flat-square"></a>
  <a href="https://github.com/zamalali/tokenpress/actions"><img alt="CI" src="https://img.shields.io/badge/tests-27%20rust%20%2B%209%20python-brightgreen?style=flat-square"></a>
  <img alt="Rust" src="https://img.shields.io/badge/core-Rust-F74C00?style=flat-square&logo=rust&logoColor=white">
  <img alt="Python" src="https://img.shields.io/badge/api-Python%203.9%2B-3776AB?style=flat-square&logo=python&logoColor=white">
</p>

---

## Why TokenPress?

LLM API calls are expensive. Most prompt tokens are redundant — filler words, predictable syntax, boilerplate. TokenPress uses **information theory** to score every token by its surprise value, keeps only what matters, and sends a compressed prompt to the LLM.

- **18.6 M tok/s** Rust core (PyO3 + rayon)
- **3.3× compression** with near-zero quality loss
- **Drop-in wrappers** for OpenAI & Anthropic — zero code changes
- **6 selection strategies**: ratio, top-k, threshold, percentile, IQR, token merging
- **Language & model agnostic** — works on any text, any LLM

## Install

```bash
pip install tokenpress              # core
pip install "tokenpress[models]"    # + scoring model (distilgpt2)
pip install "tokenpress[all]"       # + OpenAI & Anthropic wrappers
```

## Quick Start

```python
import tokenpress

result = tokenpress.compress("Your very long prompt …", ratio=0.3)
print(result.compressed_text)       # compressed prompt
print(f"{result.compression_ratio:.1f}× smaller, {result.savings_percentage:.0f}% tokens saved")
```

**Drop-in OpenAI wrapper:**

```python
import openai, tokenpress

client = tokenpress.wrap(openai.OpenAI(), ratio=0.3)
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": very_long_prompt}],
)
# same API, 3× cheaper
```

**CLI:**

```bash
tokenpress compress document.txt -r 0.3 -o compressed.txt
tokenpress bench document.txt
```

## How It Works

Each token is scored by **self-information**: $I(x_i) = -\log_2 P(x_i \mid x_{<i})$

High surprise → keep. Low surprise → remove. Pure math, no heuristics.

Built on research from **Selective Context** (EMNLP '23), **TRIM** (COLING '25), **Token Merging** (ICLR '23), **H2O** (NeurIPS '24), and **LLMLingua-2** (ACL '24).

## Links

- [GitHub](https://github.com/zamalali/tokenpress) — full docs, benchmarks, architecture
- [Benchmarks](https://github.com/zamalali/tokenpress/blob/master/BENCHMARKS.md) — 7 reproducible benchmark suites
- [License](https://github.com/zamalali/tokenpress/blob/master/LICENSE) — Apache-2.0

