Metadata-Version: 2.4
Name: fraqtl-runtime
Version: 0.1.0
Summary: fraQtl runtime — drop-in KV cache compression + INT3-resident weight loading for HuggingFace transformers.
Author-email: Samuel Salfati <samuelsalfati@gmail.com>
License: Proprietary
Project-URL: Homepage, https://fraqtl.ai
Project-URL: Source, https://github.com/fraqtl-ai
Project-URL: HuggingFace, https://huggingface.co/fraQtl
Project-URL: Issues, https://github.com/fraqtl-ai/fraqtl-diagnostic/issues
Keywords: transformers,compression,kv-cache,quantization,inference,long-context,moe
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: Other/Proprietary License
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: inference
Requires-Dist: torch>=2.0; extra == "inference"
Requires-Dist: transformers>=4.51; extra == "inference"
Requires-Dist: safetensors; extra == "inference"
Requires-Dist: huggingface_hub>=0.26; extra == "inference"
Requires-Dist: numpy; extra == "inference"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Provides-Extra: bench
Requires-Dist: datasets; extra == "bench"
Requires-Dist: scipy; extra == "bench"
Dynamic: license-file

# fraQtl

**5x KV cache compression. +0.002 PPL. 7 models, 3B–70B. One line of code.**

Runtime KV-cache compression via the Attention Importance Kernel. Protect the directions that matter. Quantize the rest. Drop-in, no retraining, production-ready.

## Results (verified, 7 models)

| Model | Params | Arch | k=16 | k=32 |
|-------|--------|------|------|------|
| Mistral 7B | 7B | GQA-8 | +0.019 | +0.007 |
| Llama 3.2 3B | 3B | GQA-3 | +0.043 | +0.011 |
| Llama-2-7B | 7B | MHA-32 | +0.022 | +0.007 |
| Qwen 2.5 3B | 3B | GQA-2 | +0.034 | +0.010 |
| Llama 3.1 8B | 8B | GQA-8 | +0.034 | +0.025 |
| Llama-2-13B | 13B | MHA-40 | +0.019 | +0.005 |
| Llama 3.1 70B | 70B | GQA-8 | +0.079 | +0.019 |

All measured at runtime on live KV cache. Split prefill/eval methodology. Same config everywhere.

### vs Competition (Llama-2-7B)

| Method | PPL Delta | Compression |
|--------|-----------|-------------|
| **fraQtl k=32** | **+0.007** | **5x** |
| **fraQtl k=16** | **+0.022** | **5x** |
| KVQuant 2-bit | +0.27 | ~5x |
| KIVI K2V2 | +1.00 | ~5x |

### Memory at Scale

| Context | KV Cache (FP16) | fraQtl 5x | Savings |
|---------|----------------|-----------|---------|
| 4K | 2.1 GB | 430 MB | 1.7 GB |
| 32K | 17 GB | 3.4 GB | **14 GB** |
| 128K | 69 GB | 14 GB | **55 GB** |

## Install

```bash
pip install git+https://github.com/samuelsalfati/fraqtl.git
```

## Quick Start

```python
import fraqtl

# Authenticate (get token at fraqtl.ai)
fraqtl.login("sk_fraqtl_...")

# Compress
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1",
                                              torch_dtype="float16", device_map="auto")

model = fraqtl.aipress_kv(model, calib_seqs)
# That's it. Serve normally.
```

### CLI

```bash
fraqtl compress --model mistralai/Mistral-7B-v0.1 --k 16 --eval
fraqtl analyze --model mistralai/Mistral-7B-v0.1
```

## How It Works

1. **Eigenbasis** — compute the Attention Importance Kernel (V^T alpha^T alpha V) from one forward pass
2. **Protect** — top-k eigendirections at full precision
3. **Sacrifice** — remaining directions at INT3
4. **Zero overhead** — W_O fusion absorbs rotation into weights

## Paper

**"The Right Basis, Not the Right Subspace: Downstream-Optimal Quantization for KV-Cache Compression"**

Samuel Salfati, Cornell University

## Patent

Patent pending (filed April 6, 2026).

## License

Proprietary. Early access available at [fraqtl.ai](https://fraqtl.ai).
