Metadata-Version: 2.4
Name: fraqtl-runtime
Version: 0.1.1
Summary: fraQtl runtime — drop-in loader for fraQtl-compressed Hugging Face checkpoints. Production LLM inference with calibration-aware compression.
Author-email: Samuel Salfati <samuelsalfati@gmail.com>
License: Proprietary
Project-URL: Homepage, https://fraqtl.ai
Project-URL: Source, https://github.com/fraqtl-ai
Project-URL: HuggingFace, https://huggingface.co/fraQtl
Project-URL: Issues, https://github.com/fraqtl-ai/fraqtl-diagnostic/issues
Keywords: transformers,compression,kv-cache,quantization,inference,long-context,moe
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: Other/Proprietary License
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: inference
Requires-Dist: torch>=2.0; extra == "inference"
Requires-Dist: transformers>=4.51; extra == "inference"
Requires-Dist: safetensors; extra == "inference"
Requires-Dist: huggingface_hub>=0.26; extra == "inference"
Requires-Dist: numpy; extra == "inference"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Provides-Extra: bench
Requires-Dist: datasets; extra == "bench"
Requires-Dist: scipy; extra == "bench"
Dynamic: license-file

# fraQtl

**Runtime KV-cache and weight compression for production LLM inference.**

Drop-in. No retraining. Calibration-aware.

---

## What it is

`fraqtl-runtime` is the runtime loader for fraQtl-compressed model artifacts. It enables:

- **Weight compression**: load fraQtl-compressed Hugging Face checkpoints (e.g. [`fraQtl/Qwen3.6-35B-A3B-compressed`](https://huggingface.co/fraQtl/Qwen3.6-35B-A3B-compressed)) via standard `transformers` with `trust_remote_code=True`. The wheel ships the compiled loader that decodes the packed weights at load time.
- **Runtime KV-cache compression** (separate, in active validation): a llama.cpp-compatible runtime layer that compresses the V cache at runtime — independent of weight format.

---

## Install

```bash
pip install fraqtl-runtime
```

That's the entire setup. No license token required for loading published artifacts.

---

## Quick start

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

repo = "fraQtl/Qwen3.6-35B-A3B-compressed"
model = AutoModelForCausalLM.from_pretrained(
    repo, trust_remote_code=True,
    torch_dtype=torch.bfloat16, device_map="auto",
)
tok = AutoTokenizer.from_pretrained(repo)

ids = tok("The capital of France is", return_tensors="pt").to(model.device)
print(tok.decode(model.generate(**ids, max_new_tokens=20, do_sample=False)[0]))
```

`trust_remote_code=True` pulls a small stub from the model repo that imports the compiled loader from this wheel. You never write `import fraqtl` directly.

---

## High-level approach

fraQtl combines two ideas:

1. **Calibration-aware eigenbasis rotation** — protect the input directions that matter for the deployment task; quantize the rest. The calibration corpus determines which directions are protected (this is **FPT — fraQtl Pullback Theorem**).
2. **Per-row sign correction primitive** — additional precision on top of low-bit quantization where it matters most for reasoning.

Both compose with standard quantization machinery (Lloyd-Max centroids, INT3 packing) and standard inference engines (HF transformers, llama.cpp).

---

## Status

- **Public weight-compression artifacts** on Hugging Face: [huggingface.co/fraQtl](https://huggingface.co/fraQtl)
- **Runtime KV-cache compression** layer: in active validation. Public benchmark numbers landing after H100 measurement lock and manual review.
- **Methodology paper** in preparation.

---

## Links

- Site: [fraqtl.ai](https://fraqtl.ai)
- Hugging Face: [huggingface.co/fraQtl](https://huggingface.co/fraQtl)
- Diagnostic tool (open-source): [`fraqtl-diagnostic`](https://pypi.org/project/fraqtl-diagnostic/)
- Contact: contact@fraqtl.ai

---

## License

Proprietary. The compressed model weights and loader are free to install and use for research and evaluation. Production / commercial use: contact fraQtl.
