Metadata-Version: 2.4
Name: supercompress
Version: 0.5.0
Summary: SuperCompress — learned context compression for LLMs.
Author-email: "Arjun K. Shah" <arjunk.shah21@gmail.com>
License: MIT
Project-URL: Homepage, https://supercompress.vercel.app
Project-URL: Documentation, https://arjunkshah-supercompress-55.mintlify.app/
Project-URL: Repository, https://github.com/arjunkshah/supercompress
Keywords: llm,compression,prompt-compression,token-reduction,context-compression,ai-optimization,inference-optimization
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.0.0
Requires-Dist: numpy>=1.24.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: ruff>=0.4.0; extra == "dev"
Requires-Dist: httpx>=0.27.0; extra == "dev"
Provides-Extra: serve
Requires-Dist: fastapi>=0.115.0; extra == "serve"
Requires-Dist: uvicorn[standard]>=0.30.0; extra == "serve"
Requires-Dist: pydantic>=2.0.0; extra == "serve"
Requires-Dist: httpx>=0.27.0; extra == "serve"
Provides-Extra: firebase
Requires-Dist: firebase-admin>=6.5.0; extra == "firebase"
Dynamic: license-file

# SuperCompress

**Learned context compression for LLMs** — trim long prompts before inference with a small CPU policy, measurable quality vs baselines, and documented environmental impact.

[![GitHub stars](https://img.shields.io/github/stars/arjunkshah/supercompress?style=flat&logo=github)](https://github.com/arjunkshah/supercompress/stargazers)
[![PyPI version](https://img.shields.io/pypi/v/supercompress?style=flat&logo=python&logoColor=white)](https://pypi.org/project/supercompress/)
[![Python](https://img.shields.io/pypi/pyversions/supercompress?style=flat&logo=python&logoColor=white)](https://pypi.org/project/supercompress/)
[![License](https://img.shields.io/github/license/arjunkshah/supercompress?style=flat)](LICENSE)
[![Tests](https://img.shields.io/badge/tests-65%20passing-brightgreen?style=flat)](#development)

| | |
|---|---|
| **Live site** | [supercompress.vercel.app](https://supercompress.vercel.app) |
| **Documentation** | [arjunkshah-supercompress-55.mintlify.app](https://arjunkshah-supercompress-55.mintlify.app/) |
| **API dashboard** | [`/dashboard`](https://supercompress.vercel.app/dashboard) on the live site |
| **Hosted API** | Same origin on Vercel — `/api/health`, `/api/v1/compress`, dashboard at `/dashboard` |

---

<p align="center">
  <a href="https://supercompress.vercel.app/playground">
    <img src="web/assets/img/architecture.svg" alt="SuperCompress architecture: context and question enter, compression policy keeps answer-critical lines, compressed context sent to LLM" width="700" />
  </a>
</p>
<p align="center">
  <em><a href="https://supercompress.vercel.app/playground">Open the interactive playground →</a></em>
</p>

## Why SuperCompress?

Long agent context is expensive. Blind truncation keeps head and tail but drops answers in the middle. SuperCompress learns **which lines to keep** for the current question — under a fixed token budget.

| Metric | SuperCompress | Truncation / FIFO |
|--------|---------------|-------------------|
| KV savings @ 35% budget | **~65%** | ~65% |
| Oracle recall | **100%** | ~25% |
| Policy size | **~5K params** | rule-based |
| Runs on | **CPU** (pre-inference) | CPU |

At 1M compressions (est.): **~800M tokens avoided · 29 kWh · 12 kg CO₂** — see the [environment guide](https://arjunkshah-supercompress-55.mintlify.app/guides/environment).

### Hosted API (Vercel)

The live site ships serverless API routes backed by Vercel Blob for key storage. No separate deploy step — push to main and Vercel builds static `web/` plus `api/`.

Optional self-host: Docker, Fly.io (`fly.toml`), or Render (`render.yaml`) for the Python FastAPI stack.

---

## Quick start

### Hosted API (key + package)

```bash
pip install git+https://github.com/arjunkshah/supercompress.git
export SUPERCOMPRESS_API_KEY=sc_live_YOUR_KEY
```

```python
from supercompress import SuperCompress

out = SuperCompress().compress(context, "Your question")
print(out.compressed_text)
```

Get a key at [supercompress.vercel.app/dashboard](https://supercompress.vercel.app/dashboard).

### Install (local compression)

```bash
pip install git+https://github.com/arjunkshah/supercompress.git
# local dev + tests + API server
pip install -e ".[dev,serve]"
```

### Python (in-process)

```python
from supercompress import compress_context, compare_policies

result = compress_context(
    "long context text…",
    "What does fetch return when the row is missing?",
    budget_ratio=0.35,
)
print(result.compressed_text)
print(f"{result.kv_savings_pct:.1f}% KV saved · {result.kept_tokens}/{result.original_tokens} tokens")
```

### Hosted API (recommended)

**1. Get a key** — [dashboard](https://supercompress.vercel.app/dashboard) → Create key → copy `sc_live_…`

**2. Install & call** (stdlib HTTP client — no local PyTorch needed for the API):

```bash
pip install git+https://github.com/arjunkshah/supercompress.git
export SUPERCOMPRESS_API_KEY=sc_live_YOUR_KEY
```

```python
from supercompress import SuperCompress

sc = SuperCompress()  # reads SUPERCOMPRESS_API_KEY
out = sc.compress("long context…", "What does fetch return?")
print(out.compressed_text)  # send to your LLM
```

Or raw HTTP:

```bash
curl -X POST https://supercompress.vercel.app/api/v1/compress \
  -H "X-API-Key: sc_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"context":"…","query":"Summarize","budget_ratio":0.35}'
```

On the live site, the dashboard hits the same origin — no `SC_API_BASE` config needed.

**Local dev** (no Firebase):

```bash
SC_AUTH_DEV=1 SC_KEY_STORE=memory python scripts/local_web_server.py
# → http://127.0.0.1:8790/dashboard
```

**Deploy API** (Docker / Fly.io / Render): see the [API dashboard guide](https://arjunkshah-supercompress-55.mintlify.app/guides/api-dashboard).

### Browser demo

Open [`web/index.html`](web/index.html) or deploy the static `web/` folder. Compression runs client-side — no API key required for the playground.

---

## Documentation

Full docs: **[arjunkshah-supercompress-55.mintlify.app](https://arjunkshah-supercompress-55.mintlify.app/)**

| Doc | Description |
|-----|-------------|
| [Quickstart](https://arjunkshah-supercompress-55.mintlify.app/quickstart) | First compression in minutes |
| [API reference](https://arjunkshah-supercompress-55.mintlify.app/api-reference/http-overview) | Python + HTTP endpoints |
| [API dashboard](https://arjunkshah-supercompress-55.mintlify.app/guides/api-dashboard) | Keys, auth, usage |
| [Integrations](https://arjunkshah-supercompress-55.mintlify.app/guides/integrations) | OpenAI, LangChain, LlamaIndex |
| [Environment](https://arjunkshah-supercompress-55.mintlify.app/guides/environment) | kWh / CO₂ methodology |

Repo copies also live under [`docs/`](docs/).

---

## Benchmarks

```bash
python scripts/benchmark_web.py    # regenerates web/assets/data/benchmarks.json
python scripts/generate_charts.py  # SVG charts for landing page
pytest tests/ -q                   # 65 tests
```

Full benchmarks: [supercompress.vercel.app/benchmarks](https://supercompress.vercel.app/benchmarks)

Policy comparison (8 seeds, budget 0.35):

| Policy | Oracle recall | Entity recall | Latency |
|--------|---------------|---------------|---------|
| FIFO / Truncation | 25% | 73% | ~57 ms |
| Summarization | 61% | 65% | ~63 ms |
| H2O | 98% | 73% | ~56 ms |
| **SuperCompress** | **100%** | **73%** | ~60 ms |

Charts: `web/assets/img/chart-kv-savings.svg`, `chart-oracle-recall.svg`, `chart-impact.svg`

---

## Project layout

```
supercompress/          # Core library (~5K-param policy, baselines)
  api/                  # Hosted API — keys, Firebase auth, usage
web/                    # Landing page + browser demo + dashboard
scripts/                # benchmark_web.py, local_web_server.py, charts
tests/                  # test_supercompress, test_api_hard, test_api_server
checkpoints/default.pt  # Trained weights (included)
docs/                   # API, integrations, environment, dashboard
```

---

## Development

```bash
git clone https://github.com/arjunkshah/supercompress.git
cd supercompress
pip install -e ".[dev,serve]"
pytest tests/ -q
python scripts/local_web_server.py   # optional: /dashboard, /v1/compress
```

Optional extras:

```bash
pip install -e ".[firebase]"   # Firebase Admin for production key store
```

---
NOTE: I DO NOT GIVE AYUSH ROUT (github.com/ayushrout12) ANY PERMISSION TO COPY OR USE MY PRODUCT IN ANY WAY, SHAPE, OR FORM. I DO NOT GIVE HIM CONSENT TO FORK, REFERENCE, OR CLONE/REFERENCE/USE THIS REPO IN ANY WAY, SHAPE, OR FORM.
---

## What we claim (and don't)

**We claim:** learned CPU eviction beats truncation on oracle recall at similar KV savings; documented environmental estimates; reproducible benchmarks and tests.

**We don't claim:** live datacenter metering; CO₂ numbers without documented assumptions; that every workload matches benchmark seeds.

---

## License

MIT — see [LICENSE](LICENSE).
