Metadata-Version: 2.4
Name: ultracompress
Version: 0.6.21
Summary: Lossless 5-bit transformer compression — 20 PPL-verified architectures (19 transformer + 1 state-space model with comparator-note caveat) across 22 packs shipped (0.6B–405B; dense + MoE + SSM). Public CLI does pack structure + download-integrity checks; the codec is patent-pending and not distributed. BUSL-1.1 + Additional Use Grant.
Author-email: "Sipsa Labs, Inc." <founder@sipsalabs.com>
License: BUSL-1.1
Project-URL: Homepage, https://github.com/sipsalabs/ultracompress
Project-URL: Repository, https://github.com/sipsalabs/ultracompress
Project-URL: Issues, https://github.com/sipsalabs/ultracompress/issues
Keywords: model-compression,llm,transformer,quantization,inference
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: LICENSE.apache
License-File: NOTICE.md
Provides-Extra: dev
Requires-Dist: build>=1.5; extra == "dev"
Requires-Dist: hypothesis>=6.148; extra == "dev"
Requires-Dist: mypy>=1.20; extra == "dev"
Requires-Dist: pre-commit>=4.5; extra == "dev"
Requires-Dist: pytest>=9.0; extra == "dev"
Requires-Dist: pytest-cov>=7.0; extra == "dev"
Requires-Dist: ruff>=0.15; extra == "dev"
Requires-Dist: twine>=6.2; extra == "dev"
Dynamic: license-file

# UltraCompress

Lossless 5-bit transformer compression. Published model artifacts are bit-identical to their bf16 reference.

[![PyPI](https://img.shields.io/badge/pypi-0.6.21-blue.svg)](https://pypi.org/project/ultracompress/)
[![License](https://img.shields.io/badge/license-BUSL--1.1-blue.svg)](LICENSE)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![Patent](https://img.shields.io/badge/patent-pending-orange.svg)](./PATENT_NOTICE.md)

> **v0.6.21:** the public package is intentionally minimal — a small,
> dependency-free CLI that lets you (a) generate text against a Sipsa-
> hosted compressed model in 30 seconds (`uc try`), (b) browse the full
> catalog with tiers and PPL ratios (`uc catalog`), and (c) verify pack
> **structure** and **download integrity** (`uc verify`) on any pack you
> download from HuggingFace. It contains **no** compression or
> reconstruction code: that methodology is patent-pending and is not
> distributed. Bit-identical reconstruction verification of a pack is
> performed by Sipsa Labs under engagement.

Hermes-3-Llama-3.1-405B compressed at 5 bpw lossless: **1.0066x PPL ratio** vs streaming bf16 teacher (5.0692 / 5.0358, n=50, seq_len=1024, FineWeb-edu held-out tail, seed=42). A 405B-class transformer compressed end-to-end on a single 32 GB consumer GPU.

UltraCompress takes a transformer at fp16/bf16 and produces a 5-bit pack that reconstructs **bit-identically** to the reference bf16 checkpoint — not "1% PPL drift on WikiText," but a deterministic reconstruction. That is the honest definition of lossless we care about: an auditor can re-derive every weight from the pack, and Sipsa Labs verifies that bit-identity under engagement. The codec is patent-pending.

It exists because the bf16-equivalent quality bar matters in places where "good enough on MMLU" isn't enough — defense, FDA-regulated healthcare, SR 11-7 model validation, internal red-team eval at frontier labs. And as a side-effect of the streaming compression path, it lets us put a 405B-parameter model through a single 32 GB consumer GPU without renting an H100 cluster.

We're a small lab shipping this in public while the patents are pending. Most days the lab notebook gets longer than the marketing site does.

---

> **Regulated AI deployment?** Phase 0 POC is **$5K / 5 business days / customer-picked model** — full details in [Who this is for](#who-this-is-for) below. Direct: `founder@sipsalabs.com`. Verticals: **[healthcare](https://sipsalabs.com/vertical/healthcare)** · **[defense](https://sipsalabs.com/vertical/defense)** · **[legal](https://sipsalabs.com/vertical/legal)** · **[quant](https://sipsalabs.com/vertical/quant)**.

---

## Quick start (30 seconds, no GPU, no signup)

```bash
pip install ultracompress
uc try sipsa-qwen3-0.6b
```

That prints a recorded reference response from our 5-bit-compressed Qwen3-0.6B pack plus the compression numbers, and points you at the next step. With a free key from [sipsalabs.com/get-access](https://sipsalabs.com/get-access) (60-second signup), the same command goes live against `api.sipsalabs.com` and streams real output from whichever compressed model you pick.

```bash
uc catalog
```

Lists all 20 PPL-verified architectures (19 transformer + 1 state-space model with comparator-note caveat) with their published PPL ratios and tier (free / request / POC).

## The public CLI (what `pip install` gives you)

```
uc try [model]         generate text against a Sipsa-hosted compressed model
uc catalog             list the full compressed-model catalog + tiers
uc verify <pack_dir>   pack structure + download-integrity self-check
uc info                what this package is + links/contact
uc version             print version
```

`uc try` calls `api.sipsalabs.com/v1/chat/completions` when you pass `--key sk-sps-...` or set `$SIPSA_API_KEY`; without a key, it prints a recorded reference response so you see what compressed output looks like without signup.

`uc verify` confirms a downloaded pack is well-formed (manifest present and parseable, declared layer count matches the files on disk, no zero-byte layers) and prints a stable SHA-256 **pack fingerprint** so you can confirm you hold a byte-identical download, or compare against a fingerprint we publish out of band. It does **not** reconstruct weights and contains no codec knowledge by design.

```bash
hf download SipsaLabs/qwen3-1.7b-base-uc-v3-bpw5 --local-dir ./pack
uc verify ./pack
```

```
bpw:             5
layer files:     28
SHA-256 (spot-check; use --full for all):
  manifest.json:f3a1…
  layer_000.uc:7c2b…
  layer_014.uc:9d4f…
  layer_027.uc:1ab8…
pack fingerprint (sha256 of sorted file digests):
  4e9c… (64 hex)

→ STRUCTURE OK — pack is well-formed; fingerprint above is the
  download-integrity reference. This is NOT a reconstruction proof;
  bit-identical reconstruction verification is provided by Sipsa
  Labs under engagement (founder@sipsalabs.com).
```

Full bit-identical reconstruction verification (and PPL re-evaluation against the bf16 baseline) is an auditor-grade deliverable Sipsa Labs runs with you under engagement — it is deliberately not shipped in the public package.

---

## What's verified (with JSON receipts)

**20 architectures independently PPL-verified end-to-end** (0.6B → 405B, dense + MoE + SSM) against each model's own bf16 baseline on the FineWeb-edu held-out tail at seq_len=1024, seed=42. 19 are transformer architectures; the 20th is Mamba-2.8B (state-space model) at **1.00593× canonical PPL**, with an explicit comparator-note caveat in the registry: our canonical transformer pipeline (RoPE / attention masks / KV-cache semantics) is architecture-incompatible with SSMs, so the Mamba record uses an SSM-compatible comparator that matches what's in the HF pack. Additional architectures (DeepSeek-32B, Llama-3.1-70B, others) are compressed and SHA-256-verified but their PPL numbers are pending canonical re-eval before formal registry entry. Every published number traces to a published result JSON. A small set of packs is publicly downloadable; the full catalog is available to customers under engagement.

| Model | Params | Class | PPL ratio | HF artifact | Status |
|---|---|---|---|---|---|
| Hermes-3-Llama-3.1-405B | 405B | 405B-class lossless on a single 32 GB consumer GPU | **1.0066** | [`SipsaLabs/hermes-3-llama-3.1-405b-uc-v3-bpw5`](https://huggingface.co/SipsaLabs/hermes-3-llama-3.1-405b-uc-v3-bpw5) | live |
| Mistral-7B-v0.3 | 7.2B | sub-0.6% drift | **1.00548** | [`SipsaLabs/mistral-7b-v0.3-uc-v3-bpw5`](https://huggingface.co/SipsaLabs/mistral-7b-v0.3-uc-v3-bpw5) | live |
| Qwen3-1.7B-Base | 1.7B | sub-0.5% drift | **1.00401** | `SipsaLabs/qwen3-1.7b-base-uc-v3-bpw5` | live |
| Qwen3-14B | 14.0B | sub-0.5% drift | **1.00403** | `SipsaLabs/qwen3-14b-uc-v3-bpw5` | live |
| Qwen3-8B | 8.0B | sub-0.5% drift | **1.00440** | `SipsaLabs/qwen3-8b-uc-v3-bpw5` | live |
| Mixtral-8x7B-v0.1 (MoE) | 47B (13B active) | sub-0.5% drift | **1.00368** | `SipsaLabs/mixtral-8x7b-v0.1-uc-v3-bpw5` | live |
| Phi-3-mini-4k-instruct | 3.8B | sub-0.3% drift (seq_len=128, not apples-to-apples) | **1.00262** | `SipsaLabs/phi-3-mini-4k-instruct-uc-v3-bpw5` | live |

Hermes-3-405B is the headline. The 1.0066x ratio is `5.0692 / 5.0358` — both halves measured under the same per-layer streaming reconstruction comparator (n=50, seq_len=1024, FineWeb-edu held-out tail, seed=42). The bf16 teacher took 7.7 hours on cuda:1; the 5-bpw pack took 14.3 hours. The Mistral-7B 1.00548× row is the tightest dense 7B-class lossless 5-bit ratio we currently publish.

- **SSM result**: Mamba-2.8B compressed with SHA-256 bit-identical reconstruction verified — first public lossless 5-bit canonical-PPL result on a state-space model that we know of, at **1.00593× canonical ratio**. Counted as the 20th verified architecture with an explicit comparator-note caveat in the registry: our canonical transformer pipeline (RoPE / attention masks / KV-cache semantics that don't apply to SSMs) is architecture-incompatible, so the Mamba record uses an SSM-compatible comparator that matches what's in the HF pack. The comparator is documented in the registry.
- **HuggingFace**: a small public verification set under [`huggingface.co/SipsaLabs`](https://huggingface.co/SipsaLabs); full catalog under engagement.
- **PyPI**: [pypi.org/project/ultracompress](https://pypi.org/project/ultracompress/).

---

## What doesn't work yet

Things people sometimes assume work because the rest of it does. They don't, and we'd rather you know:

- **Long-context evaluation past seq_len=1024.** Every PPL number above is at seq_len=1024 on the FineWeb-edu held-out tail. We have not yet run controlled evals at 4K/8K/32K context.
- **State-space models past the current SSM pack.** Mamba-2.8B ships + SHA-256-verified + canonical PPL claimed at **1.00593×** (with comparator-note caveat documented in registry; our canonical transformer pipeline is architecture-incompatible with SSMs). We tried two tighter paths on top — both made it worse.
- **TinyLlama-1.1B-Chat PPL eval.** The pack itself is well-formed and the HF artifact uploaded, but the PPL eval forward pass throws a CUDA device-side assert that we haven't traced yet. Shown as deferred, not a fabricated number.
- **Qwen3-32B and Llama-3.1-70B PPL ratios.** Both have stale or suspect baseline PPL numbers we won't republish. Apples-to-apples re-evals are queued.
- **Below 1.0040× on Qwen3-1.7B-Base.** This is our tightest dense floor; we tried 5 different paths to break it. Three were within noise; two were catastrophic regressions. 1.0040× stands as the empirical floor at the current configuration.

---

## Why this isn't AWQ / GPTQ / EXL3

Every other 4–5 bit compression library targets a quality threshold ("sub-1% PPL on WikiText"). UltraCompress targets a **reconstruction contract**: the published artifact reconstructs bit-identically to the reference bf16 checkpoint. Codec internals are patent-pending and deliberately not described here.

This matters when "the model picks a slightly-wrong variable name" is a regulatory finding rather than a cosmetic complaint. Defense / aerospace deploy-bit-exactness is a compliance requirement. FDA-regulated healthcare AI requires model equivalence between dev and deploy. SR 11-7 (Federal Reserve model validation) requires reproducible audit recovery.

For pure-throughput inference on a fixed prompt distribution that matches your AWQ calibration set, with no downstream fine-tuning, AWQ at 4 bpw on vLLM is genuinely fine and we'll say so on a sales call.

As of mid-2026 we are not aware of another published library targeting a bit-identical reconstruction contract (as opposed to a PPL-threshold) for 5-bit transformer compression on the public HuggingFace Hub. If you find one, tell us — we'd rather benchmark against it than claim a gap that isn't there.

---

## Honest negative results

Most projects hide their failures. We catalogue them at the same level of detail as the wins.

- **An initialization shortcut we tried** — made PPL 0.07 pp WORSE on Mamba and was discarded. Method specifics withheld (patent-pending).
- **A multi-pass variant we hypothesized would help** — produced a catastrophic 13.7× regression vs. the single-pass baseline. CLOSED.
- **Importing an AWQ-style pre-scaling step** — produced a catastrophic +13% regression and was ruled out. CLOSED.
- **Pushing the training schedule past the current configuration** — gained nothing (within noise). The floor stands.
- **"Base models compress tighter than instruct" hypothesis** — refuted 2/3 of architectures. Dropped.

Detailed methodology for any specific failure is available to design partners under NDA.

---

## Who this is for

- **If you serve LLMs in production and your VRAM bill is the constraint**, this might help. It scales to a 405B-class model on a single 32 GB consumer GPU (the how is patent-pending). Email `founder@sipsalabs.com` with your stack and a target latency/quality bar; we'll tell you honestly whether UC fits.
- **If you're in a regulated domain** (defense, FDA-regulated healthcare, SR 11-7 model validation, frontier lab red-team), the bit-identical reconstruction contract is the reason to talk to us. Phase 0 POC ($5K, 5 business days, customer-picked model) gets you a pack plus a Sipsa-run bit-identity + PPL audit you can review. Email `founder@sipsalabs.com`.

If your workload is "MMLU has to stay above X" and you're not pushing the model into long-tail or downstream-fine-tuning territory, AWQ at 4 bpw is probably a better answer than this. We'll say so.

---

## We're a small company looking for design partners

Sipsa Labs is a small lab shipping in public. Our compression methods are patent-pending; details are in [`PATENT_NOTICE.md`](./PATENT_NOTICE.md). The CLI source is BUSL-1.1 with an Additional Use Grant — free for companies under $1M ARR, research, and individuals, auto-converting to Apache 2.0 four years post-release. If you're building a derivative product whose core value depends on the underlying invention, email `founder@sipsalabs.com`.

- **Paid Phase 0 POC** — `founder@sipsalabs.com`, $5K / 5 business days / customer-picked model. Deliverable: a pack plus a Sipsa-run bit-identity + PPL audit on your eval set.
- **GitHub Sponsors** — [github.com/sponsors/sipsalabs](https://github.com/sponsors/sipsalabs).
- **Press / commentary** — `press@sipsalabs.com`.

---

## License

- Released under **BUSL-1.1 with an Additional Use Grant** (free for companies under $1M ARR, research, and individuals; auto-converts to Apache 2.0 four years post-release). See [`LICENSE`](./LICENSE).
- The license grant does **not** extend to the patent-pending compression methodology that produces the artifacts. See [`PATENT_NOTICE.md`](./PATENT_NOTICE.md).
- Pre-compressed model artifacts on HuggingFace carry the upstream teacher model's license plus this project's patent terms.

## Citation

```bibtex
@software{sipsa_ultracompress_2026,
  author = {{Sipsa Labs, Inc.}},
  title  = {UltraCompress: Lossless 5-bit Transformer Compression},
  year   = {2026},
  url    = {https://github.com/sipsalabs/ultracompress}
}
```

## Contact

- Commercial / Phase 0 POC: `founder@sipsalabs.com`
- Security: `security@sipsalabs.com`
- Press: `press@sipsalabs.com`
- HuggingFace: [`huggingface.co/SipsaLabs`](https://huggingface.co/SipsaLabs)
- PyPI: [`pypi.org/project/ultracompress`](https://pypi.org/project/ultracompress/)
