Metadata-Version: 2.4
Name: ultracompress
Version: 0.6.25
Summary: Near-lossless 5-bit transformer compression (~1% perplexity vs the bf16 reference; lossy), with reproducible, cryptographically verifiable reconstruction — 22 PPL-verified architectures (17 dense + 4 MoE + 1 SSM) plus 1 ViT cosine-verified — across 4 architecture classes (0.6B–405B). Public CLI does pack structure + download-integrity checks; the codec is patent-pending and not distributed. BUSL-1.1 + Additional Use Grant.
Author-email: "Sipsa Labs, Inc." <founder@sipsalabs.com>
License: BUSL-1.1
Project-URL: Homepage, https://github.com/sipsalabs/ultracompress
Project-URL: Repository, https://github.com/sipsalabs/ultracompress
Project-URL: Issues, https://github.com/sipsalabs/ultracompress/issues
Keywords: model-compression,llm,transformer,quantization,inference
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: LICENSE.apache
License-File: NOTICE.md
Provides-Extra: dev
Requires-Dist: build>=1.5; extra == "dev"
Requires-Dist: hypothesis>=6.148; extra == "dev"
Requires-Dist: mypy>=1.20; extra == "dev"
Requires-Dist: pre-commit>=4.5; extra == "dev"
Requires-Dist: pytest>=9.0; extra == "dev"
Requires-Dist: pytest-cov>=7.0; extra == "dev"
Requires-Dist: ruff>=0.15; extra == "dev"
Requires-Dist: twine>=6.2; extra == "dev"
Dynamic: license-file

# UltraCompress

Near-lossless 5-bit transformer compression (~1% perplexity vs the bf16 reference — it is lossy). Published model artifacts ship with reproducible, cryptographically verifiable reconstruction: SHA-256-pinned packs an auditor can deterministically decode back to the validated artifact.

[![PyPI](https://img.shields.io/badge/pypi-0.6.25-blue.svg)](https://pypi.org/project/ultracompress/)
[![License](https://img.shields.io/badge/license-BUSL--1.1-blue.svg)](LICENSE)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![Patent](https://img.shields.io/badge/patent-pending-orange.svg)](./PATENT_NOTICE.md)

> **v0.6.25:** the public package is intentionally minimal — a small,
> dependency-free CLI that lets you (a) generate text against a Sipsa-
> hosted compressed model in 30 seconds (`uc try`), (b) browse the full
> catalog with tiers and PPL ratios (`uc catalog`), and (c) verify pack
> **structure** and **download integrity** (`uc verify`) on any pack you
> download from HuggingFace. It contains **no** compression or
> reconstruction code: that methodology is patent-pending and is not
> distributed. Cryptographically verifiable reconstruction of a pack — a
> deterministic decode to its SHA-256-pinned validated artifact — is
> performed by Sipsa Labs under engagement.

Hermes-3-Llama-3.1-405B compressed at 5 bpw, near-lossless: **1.0066x PPL ratio** vs streaming bf16 teacher (5.0692 / 5.0358, n=50, seq_len=1024, FineWeb-edu held-out tail, seed=42). A 405B-class transformer compressed end-to-end on a single 32 GB consumer GPU.

UltraCompress takes a transformer at fp16/bf16 and produces a 5-bit pack that is a **near-lossless, lossy** approximation of the original — on the order of 1% perplexity drift — but with a property other quantizers don't offer: **reproducible, cryptographically verifiable reconstruction**. The decode is deterministic and every pack is SHA-256-pinned, so an auditor can re-derive the exact validated artifact from the pack, byte-for-byte, and confirm they hold the same weights Sipsa Labs evaluated. That is a reproducible, auditable reconstruction of the validated artifact — *not* bit-identity with the original bf16 (the 5-bit pack is lossy). Sipsa Labs runs that verification with you under engagement. The codec is patent-pending.

It exists because the bf16-equivalent quality bar matters in places where "good enough on MMLU" isn't enough — defense, FDA-regulated healthcare, SR 11-7 model validation, internal red-team eval at frontier labs. And as a side-effect of the streaming compression path, it lets us put a 405B-parameter model through a single 32 GB consumer GPU without renting an H100 cluster.

We're a small lab shipping this in public while the patents are pending. Most days the lab notebook gets longer than the marketing site does.

---

> **Regulated AI deployment?** Phase 0 POC is **$5K / 5 business days / customer-picked model** — full details in [Who this is for](#who-this-is-for) below. Direct: `founder@sipsalabs.com`. Verticals: **[healthcare](https://sipsalabs.com/vertical/healthcare)** · **[defense](https://sipsalabs.com/vertical/defense)** · **[legal](https://sipsalabs.com/vertical/legal)** · **[quant](https://sipsalabs.com/vertical/quant)**.

---

## Quick start (30 seconds, no GPU, no signup)

```bash
pip install ultracompress
uc try sipsa-qwen3-0.6b
```

That prints a recorded reference response from our 5-bit-compressed Qwen3-0.6B pack plus the compression numbers, and points you at the next step. With a free key from [sipsalabs.com/get-access](https://sipsalabs.com/get-access) (60-second signup), the same command goes live against `api.sipsalabs.com` and streams real output from whichever compressed model you pick.

```bash
uc catalog
```

Lists the 22 PPL-verified architectures (17 dense + 4 MoE + 1 SSM with comparator-note caveat) plus 1 ViT cosine-verified (DINOv2-Large) — across 4 architecture classes — with their published PPL ratios and tier (free / request / POC).

## The public CLI (what `pip install` gives you)

```
uc try [model]         generate text against a Sipsa-hosted compressed model
uc catalog             list the full compressed-model catalog + tiers
uc verify <pack_dir>   pack structure + download-integrity self-check
uc info                what this package is + links/contact
uc version             print version
```

`uc try` calls `api.sipsalabs.com/v1/chat/completions` when you pass `--key sk-sps-...` or set `$SIPSA_API_KEY`; without a key, it prints a recorded reference response so you see what compressed output looks like without signup.

`uc verify` confirms a downloaded pack is well-formed (manifest present and parseable, declared layer count matches the files on disk, no zero-byte layers) and prints a stable SHA-256 **pack fingerprint** so you can confirm you hold a byte-identical download, or compare against a fingerprint we publish out of band. It does **not** reconstruct weights and contains no codec knowledge by design.

```bash
hf download SipsaLabs/qwen3-1.7b-base-uc-v3-bpw5 --local-dir ./pack
uc verify ./pack
```

```
bpw:             5
layer files:     28
SHA-256 (spot-check; use --full for all):
  manifest.json:f3a1…
  layer_000.uc:7c2b…
  layer_014.uc:9d4f…
  layer_027.uc:1ab8…
pack fingerprint (sha256 of sorted file digests):
  4e9c… (64 hex)

→ STRUCTURE OK — download integrity verified; pack is well-formed
  and the fingerprint above is the per-file SHA-256 reference.
  End-to-end cryptographically verifiable reconstruction (deterministic
  decode to the validated artifact + PPL re-eval) is delivered via `uc audit`
  under engagement (founder@sipsalabs.com); see
  docs/reference/audit-receipt-schema.md for the audit-receipt schema.
```

Full cryptographically verifiable reconstruction (a deterministic decode to the validated artifact, plus PPL re-evaluation against the bf16 baseline) is an auditor-grade deliverable Sipsa Labs runs with you under engagement — it is deliberately not shipped in the public package.

---

## What's verified (with JSON receipts)

**22 architectures independently PPL-verified end-to-end** (0.6B → 405B, 17 dense + 4 MoE + 1 SSM) against each model's own bf16 baseline on the FineWeb-edu held-out tail at seq_len=1024, seed=42 — plus **1 Vision Transformer cosine-verified** (DINOv2-Large, 304M, ViT-L/14), bringing the catalog to **23 architectures across 4 classes**. 21 are transformer (17 dense + 4 MoE); the 22nd PPL-verified row is Mamba-2.8B (state-space model) at **1.00593× canonical PPL**, with an explicit comparator-note caveat in the registry: our canonical transformer pipeline (RoPE / attention masks / KV-cache semantics) is architecture-incompatible with SSMs, so the Mamba record uses an SSM-compatible comparator that matches what's in the HF pack. The 23rd (DINOv2-Large) uses CLS-token cosine similarity instead of PPL (encoder-only ViT has no autoregressive likelihood). TinyLlama-1.1B (1.003×) and Llama-3.1-70B (1.009×) graduated to PPL-verified in v0.6.23. DeepSeek-32B and other queued packs remain SHA-256-verified pending canonical re-eval before formal registry entry. Every published number traces to a published result JSON. A small set of packs is publicly downloadable; the full catalog is available to customers under engagement.

| Model | Params | Class | PPL ratio | HF artifact | Status |
|---|---|---|---|---|---|
| Hermes-3-Llama-3.1-405B | 405B | 405B-class near-lossless on a single 32 GB consumer GPU | **1.0066** | [`SipsaLabs/hermes-3-llama-3.1-405b-uc-v3-bpw5`](https://huggingface.co/SipsaLabs/hermes-3-llama-3.1-405b-uc-v3-bpw5) | live |
| Mistral-7B-v0.3 | 7.2B | sub-0.6% drift | **1.00548** | [`SipsaLabs/mistral-7b-v0.3-uc-v3-bpw5`](https://huggingface.co/SipsaLabs/mistral-7b-v0.3-uc-v3-bpw5) | live |
| Qwen3-1.7B-Base | 1.7B | sub-0.5% drift | **1.00401** | `SipsaLabs/qwen3-1.7b-base-uc-v3-bpw5` | live |
| Qwen3-14B | 14.0B | sub-0.5% drift | **1.00403** | `SipsaLabs/qwen3-14b-uc-v3-bpw5` | live |
| Qwen3-8B | 8.0B | sub-0.5% drift | **1.00440** | `SipsaLabs/qwen3-8b-uc-v3-bpw5` | live |
| Mixtral-8x7B-v0.1 (MoE) | 47B (13B active) | sub-0.5% drift | **1.00368** | `SipsaLabs/mixtral-8x7b-v0.1-uc-v3-bpw5` | live |
| Phi-3-mini-4k-instruct | 3.8B | sub-0.3% drift (seq_len=128, not apples-to-apples) | **1.00262** | `SipsaLabs/phi-3-mini-4k-instruct-uc-v3-bpw5` | live |

Hermes-3-405B is the headline. The 1.0066x ratio is `5.0692 / 5.0358` — both halves measured under the same per-layer streaming reconstruction comparator (n=50, seq_len=1024, FineWeb-edu held-out tail, seed=42). The bf16 teacher took 7.7 hours on cuda:1; the 5-bpw pack took 14.3 hours. The Mistral-7B 1.00548× row is the tightest dense 7B-class near-lossless 5-bit ratio we currently publish.

- **SSM result**: Mamba-2.8B compressed with SHA-256-verified reconstruction (download integrity + deterministic decode to the validated artifact) — first public near-lossless 5-bit canonical-PPL result on a state-space model that we know of, at **1.00593× canonical ratio**. Counted as the 20th verified architecture with an explicit comparator-note caveat in the registry: our canonical transformer pipeline (RoPE / attention masks / KV-cache semantics that don't apply to SSMs) is architecture-incompatible, so the Mamba record uses an SSM-compatible comparator that matches what's in the HF pack. The comparator is documented in the registry.
- **HuggingFace**: a small public verification set under [`huggingface.co/SipsaLabs`](https://huggingface.co/SipsaLabs); full catalog under engagement.
- **PyPI**: [pypi.org/project/ultracompress](https://pypi.org/project/ultracompress/).

---

## What doesn't work yet

Things people sometimes assume work because the rest of it does. They don't, and we'd rather you know:

- **Long-context evaluation past seq_len=1024.** Every PPL number above is at seq_len=1024 on the FineWeb-edu held-out tail. We have not yet run controlled evals at 4K/8K/32K context.
- **State-space models past the current SSM pack.** Mamba-2.8B ships + SHA-256-verified + canonical PPL claimed at **1.00593×** (with comparator-note caveat documented in registry; our canonical transformer pipeline is architecture-incompatible with SSMs). We tried two tighter paths on top — both made it worse.
- **Qwen3-32B PPL ratio.** Stale or suspect baseline PPL number we won't republish. Apples-to-apples re-eval is queued.
- **Below 1.0040× on Qwen3-1.7B-Base.** This is our tightest dense floor; we tried 5 different paths to break it. Three were within noise; two were catastrophic regressions. 1.0040× stands as the empirical floor at the current configuration.

---

## Why this isn't AWQ / GPTQ / EXL3

Every other 4–5 bit compression library targets a quality threshold ("sub-1% PPL on WikiText"). UltraCompress hits a competitive PPL ratio too (~1% drift; it is lossy) but adds a **reproducibility contract**: the published artifact comes with reproducible, cryptographically verifiable reconstruction — a deterministic decode to its SHA-256-pinned validated artifact. Codec internals are patent-pending and deliberately not described here.

This matters when "the model picks a slightly-wrong variable name" is a regulatory finding rather than a cosmetic complaint. Defense / aerospace reproducible, verifiable deployment is a compliance requirement. FDA-regulated healthcare AI requires model equivalence between dev and deploy. SR 11-7 (Federal Reserve model validation) requires reproducible audit recovery.

For pure-throughput inference on a fixed prompt distribution that matches your AWQ calibration set, with no downstream fine-tuning, AWQ at 4 bpw on vLLM is genuinely fine and we'll say so on a sales call.

As of mid-2026 we are not aware of another published library targeting a reproducible, cryptographically verifiable reconstruction contract — a deterministic, SHA-256-auditable decode on top of a competitive PPL ratio — for 5-bit transformer compression on the public HuggingFace Hub. If you find one, tell us — we'd rather benchmark against it than claim a gap that isn't there.

---

## Honest negative results

Most projects hide their failures. We catalogue them at the same level of detail as the wins.

- **An initialization shortcut we tried** — made PPL 0.07 pp WORSE on Mamba and was discarded. Method specifics withheld (patent-pending).
- **A multi-pass variant we hypothesized would help** — produced a catastrophic 13.7× regression vs. the single-pass baseline. CLOSED.
- **Importing an AWQ-style pre-scaling step** — produced a catastrophic +13% regression and was ruled out. CLOSED.
- **Pushing the training schedule past the current configuration** — gained nothing (within noise). The floor stands.
- **"Base models compress tighter than instruct" hypothesis** — refuted 2/3 of architectures. Dropped.

Detailed methodology for any specific failure is available to design partners under NDA.

---

## Who this is for

- **If you serve LLMs in production and your VRAM bill is the constraint**, this might help. It scales to a 405B-class model on a single 32 GB consumer GPU (the how is patent-pending). Email `founder@sipsalabs.com` with your stack and a target latency/quality bar; we'll tell you honestly whether UC fits.
- **If you're in a regulated domain** (defense, FDA-regulated healthcare, SR 11-7 model validation, frontier lab red-team), the reproducible, cryptographically verifiable reconstruction contract is the reason to talk to us. Phase 0 POC ($5K, 5 business days, customer-picked model) gets you a pack plus a Sipsa-run reconstruction + PPL audit you can review. Email `founder@sipsalabs.com`.

If your workload is "MMLU has to stay above X" and you're not pushing the model into long-tail or downstream-fine-tuning territory, AWQ at 4 bpw is probably a better answer than this. We'll say so.

---

## We're a small company looking for design partners

Sipsa Labs is a small lab shipping in public. Our compression methods are patent-pending; details are in [`PATENT_NOTICE.md`](./PATENT_NOTICE.md). The CLI source is BUSL-1.1 with an Additional Use Grant — free for companies under $1M ARR, research, and individuals, auto-converting to Apache 2.0 four years post-release. If you're building a derivative product whose core value depends on the underlying invention, email `founder@sipsalabs.com`.

- **Paid Phase 0 POC** — `founder@sipsalabs.com`, $5K / 5 business days / customer-picked model. Deliverable: a pack plus a Sipsa-run reconstruction + PPL audit on your eval set.
- **GitHub Sponsors** — [github.com/sponsors/sipsalabs](https://github.com/sponsors/sipsalabs).
- **Press / commentary** — `press@sipsalabs.com`.

---

## License

- Released under **BUSL-1.1 with an Additional Use Grant** (free for companies under $1M ARR, research, and individuals; auto-converts to Apache 2.0 four years post-release). See [`LICENSE`](./LICENSE).
- The license grant does **not** extend to the patent-pending compression methodology that produces the artifacts. See [`PATENT_NOTICE.md`](./PATENT_NOTICE.md).
- Pre-compressed model artifacts on HuggingFace carry the upstream teacher model's license plus this project's patent terms.

## Citation

```bibtex
@software{sipsa_ultracompress_2026,
  author = {{Sipsa Labs, Inc.}},
  title  = {UltraCompress: Near-Lossless 5-bit Transformer Compression},
  year   = {2026},
  url    = {https://github.com/sipsalabs/ultracompress}
}
```

## Contact

- Commercial / Phase 0 POC: `founder@sipsalabs.com`
- Security: `security@sipsalabs.com`
- Press: `press@sipsalabs.com`
- HuggingFace: [`huggingface.co/SipsaLabs`](https://huggingface.co/SipsaLabs)
- PyPI: [`pypi.org/project/ultracompress`](https://pypi.org/project/ultracompress/)
