Metadata-Version: 2.4
Name: ultracompress
Version: 0.6.15
Summary: Lossless 5-bit transformer compression — 14 architectures independently PPL-verified end-to-end (0.6B–405B; dense, MoE, SSM). Public CLI does pack structure + download-integrity checks; the codec is patent-pending and not distributed. BUSL-1.1 + Additional Use Grant.
Author-email: "Sipsa Labs, Inc." <founder@sipsalabs.com>
License: BUSL-1.1
Project-URL: Homepage, https://github.com/sipsalabs/ultracompress
Project-URL: Repository, https://github.com/sipsalabs/ultracompress
Project-URL: Issues, https://github.com/sipsalabs/ultracompress/issues
Keywords: model-compression,llm,transformer,quantization,inference
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: LICENSE.apache
License-File: NOTICE.md
Dynamic: license-file

# UltraCompress

Lossless 5-bit transformer compression. Published model artifacts are bit-identical to their bf16 reference.

[![PyPI](https://img.shields.io/badge/pypi-0.6.15-blue.svg)](https://pypi.org/project/ultracompress/)
[![License](https://img.shields.io/badge/license-BUSL--1.1-blue.svg)](LICENSE)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![Patent](https://img.shields.io/badge/patent-pending-orange.svg)](./PATENT_NOTICE.md)

> **v0.6.15:** the public package is intentionally minimal — a small,
> dependency-free CLI that checks pack **structure** and **download
> integrity** and prints project info. It contains **no** compression or
> reconstruction code: that methodology is patent-pending and is not
> distributed. Bit-identical reconstruction verification of a pack is
> performed by Sipsa Labs under engagement. The compressed model artifacts
> themselves remain public on HuggingFace.

Hermes-3-Llama-3.1-405B compressed at 5 bpw lossless: **1.0066x PPL ratio** vs streaming bf16 teacher (5.0692 / 5.0358, n=50, seq_len=1024, FineWeb-edu held-out tail, seed=42). A 405B-class transformer compressed end-to-end on a single 32 GB consumer GPU.

UltraCompress takes a transformer at fp16/bf16 and produces a 5-bit pack that reconstructs **bit-identically** to the reference bf16 checkpoint — not "1% PPL drift on WikiText," but a deterministic reconstruction. That is the honest definition of lossless we care about: an auditor can re-derive every weight from the pack, and Sipsa Labs verifies that bit-identity under engagement. The codec is patent-pending.

It exists because the bf16-equivalent quality bar matters in places where "good enough on MMLU" isn't enough — defense, FDA-regulated healthcare, SR 11-7 model validation, internal red-team eval at frontier labs. And as a side-effect of the streaming compression path, it lets us put a 405B-parameter model through a single 32 GB consumer GPU without renting an H100 cluster.

We're a small lab shipping this in public while the patents are pending. Most days the lab notebook gets longer than the marketing site does.

---

## The public CLI (what `pip install` gives you)

```bash
pip install ultracompress
uc info
```

```
uc verify <pack_dir>   pack structure + download-integrity self-check
uc info                what this package is + links/contact
uc version             print version
```

`uc verify` confirms a downloaded pack is well-formed (manifest present and parseable, declared layer count matches the files on disk, no zero-byte layers) and prints a stable SHA-256 **pack fingerprint** so you can confirm you hold a byte-identical download, or compare against a fingerprint we publish out of band. It does **not** reconstruct weights and contains no codec knowledge by design.

```bash
hf download SipsaLabs/qwen3-1.7b-base-uc-v3-bpw5 --local-dir ./pack
uc verify ./pack
```

```
uc_pack_version: 3
bpw:             5
layer files:     28
SHA-256 (spot-check; use --full for all):
  manifest.json:f3a1…
  layer_000.uc:7c2b…
  layer_014.uc:9d4f…
  layer_027.uc:1ab8…
pack fingerprint (sha256 of sorted file digests):
  4e9c… (64 hex)

→ STRUCTURE OK — pack is well-formed; fingerprint above is the
  download-integrity reference. This is NOT a reconstruction proof;
  bit-identical reconstruction verification is provided by Sipsa
  Labs under engagement (founder@sipsalabs.com).
```

Full bit-identical reconstruction verification (and PPL re-evaluation against the bf16 baseline) is an auditor-grade deliverable Sipsa Labs runs with you under engagement — it is deliberately not shipped in the public package.

---

## What's verified (with JSON receipts)

**14 architectures independently PPL-verified end-to-end** (0.6B → 405B, dense + MoE + state-space) against each model's own bf16 baseline on the FineWeb-edu held-out tail at seq_len=1024, seed=42. Every published number traces to a published result JSON. A small set of packs is publicly downloadable; the full catalog is available to customers under engagement.

| Model | Params | Class | PPL ratio | HF artifact | Status |
|---|---|---|---|---|---|
| Hermes-3-Llama-3.1-405B | 405B | 405B-class lossless on a single 32 GB consumer GPU | **1.0066** | [`SipsaLabs/hermes-3-llama-3.1-405b-uc-v3-bpw5`](https://huggingface.co/SipsaLabs/hermes-3-llama-3.1-405b-uc-v3-bpw5) | live |
| Mistral-7B-v0.3 | 7.2B | sub-0.6% drift | **1.00548** | [`SipsaLabs/mistral-7b-v0.3-uc-v3-bpw5`](https://huggingface.co/SipsaLabs/mistral-7b-v0.3-uc-v3-bpw5) | live |
| Qwen3-1.7B-Base | 1.7B | sub-0.5% drift | **1.00401** | `SipsaLabs/qwen3-1.7b-base-uc-v3-bpw5` | live |
| Qwen3-14B | 14.0B | sub-0.5% drift | **1.00403** | `SipsaLabs/qwen3-14b-uc-v3-bpw5` | live |
| Qwen3-8B | 8.0B | sub-0.5% drift | **1.00440** | `SipsaLabs/qwen3-8b-uc-v3-bpw5` | live |
| Mixtral-8x7B-v0.1 (MoE) | 47B (13B active) | sub-0.5% drift | **1.00368** | `SipsaLabs/mixtral-8x7b-v0.1-uc-v3-bpw5` | live |
| Phi-3-mini-4k-instruct | 3.8B | sub-0.3% drift (seq_len=128, not apples-to-apples) | **1.00262** | `SipsaLabs/phi-3-mini-4k-instruct-uc-v3-bpw5` | live |

Hermes-3-405B is the headline. The 1.0066x ratio is `5.0692 / 5.0358` — both halves measured under the same per-layer streaming reconstruction comparator (n=50, seq_len=1024, FineWeb-edu held-out tail, seed=42). The bf16 teacher took 7.7 hours on cuda:1; the 5-bpw pack took 14.3 hours. The Mistral-7B 1.00548× row is the tightest dense 7B-class lossless 5-bit ratio we currently publish.

- **Lossless 5-bit state-space-model compression**: Mamba-2.8B at 1.0119.
- **HuggingFace**: a small public verification set under [`huggingface.co/SipsaLabs`](https://huggingface.co/SipsaLabs); full catalog under engagement.
- **PyPI**: [pypi.org/project/ultracompress](https://pypi.org/project/ultracompress/).

---

## What doesn't work yet

Things people sometimes assume work because the rest of it does. They don't, and we'd rather you know:

- **Long-context evaluation past seq_len=1024.** Every PPL number above is at seq_len=1024 on the FineWeb-edu held-out tail. We have not yet run controlled evals at 4K/8K/32K context.
- **State-space models past the current SSM result.** Mamba-2.8B at 1.0119 is the SSM number, full stop. We tried two tighter paths on top — both made it worse.
- **TinyLlama-1.1B-Chat PPL eval.** The pack itself is well-formed and the HF artifact uploaded, but the PPL eval forward pass throws a CUDA device-side assert that we haven't traced yet. Shown as deferred, not a fabricated number.
- **Qwen3-32B and Llama-3.1-70B PPL ratios.** Both have stale or suspect baseline PPL numbers we won't republish. Apples-to-apples re-evals are queued.
- **Below 1.0040× on Qwen3-1.7B-Base.** This is our tightest dense floor; we tried 5 different paths to break it. Three were within noise; two were catastrophic regressions. 1.0040× stands as the empirical floor at the current configuration.

---

## Why this isn't AWQ / GPTQ / EXL3

Every other 4–5 bit compression library targets a quality threshold ("sub-1% PPL on WikiText"). UltraCompress targets a **reconstruction contract**: the published artifact reconstructs bit-identically to the reference bf16 checkpoint. Codec internals are patent-pending and deliberately not described here.

This matters when "the model picks a slightly-wrong variable name" is a regulatory finding rather than a cosmetic complaint. Defense / aerospace deploy-bit-exactness is a compliance requirement. FDA-regulated healthcare AI requires model equivalence between dev and deploy. SR 11-7 (Federal Reserve model validation) requires reproducible audit recovery.

For pure-throughput inference on a fixed prompt distribution that matches your AWQ calibration set, with no downstream fine-tuning, AWQ at 4 bpw on vLLM is genuinely fine and we'll say so on a sales call.

As of mid-2026 we are not aware of another published library targeting a bit-identical reconstruction contract (as opposed to a PPL-threshold) for 5-bit transformer compression on the public HuggingFace Hub. If you find one, tell us — we'd rather benchmark against it than claim a gap that isn't there.

---

## Honest negative results

Most projects hide their failures. We catalogue them at the same level of detail as the wins.

- **An initialization shortcut we tried** — made PPL 0.07 pp WORSE on Mamba and was discarded. Method specifics withheld (patent-pending).
- **A multi-pass variant we hypothesized would help** — produced a catastrophic 13.7× regression vs. the single-pass baseline. CLOSED.
- **Importing an AWQ-style pre-scaling step** — produced a catastrophic +13% regression and was ruled out. CLOSED.
- **Pushing the training schedule past the current configuration** — gained nothing (within noise). The floor stands.
- **"Base models compress tighter than instruct" hypothesis** — refuted 2/3 of architectures. Dropped.

Detailed methodology for any specific failure is available to design partners under NDA.

---

## Who this is for

- **If you serve LLMs in production and your VRAM bill is the constraint**, this might help. It scales to a 405B-class model on a single 32 GB consumer GPU (the how is patent-pending). Email `founder@sipsalabs.com` with your stack and a target latency/quality bar; we'll tell you honestly whether UC fits.
- **If you're in a regulated domain** (defense, FDA-regulated healthcare, SR 11-7 model validation, frontier lab red-team), the bit-identical reconstruction contract is the reason to talk to us. Phase 0 POC ($5K, 5 business days, customer-picked model) gets you a pack plus a Sipsa-run bit-identity + PPL audit you can review. Email `founder@sipsalabs.com`.

If your workload is "MMLU has to stay above X" and you're not pushing the model into long-tail or downstream-fine-tuning territory, AWQ at 4 bpw is probably a better answer than this. We'll say so.

---

## We're a small company looking for design partners

Sipsa Labs is a small lab shipping in public. Our compression methods are patent-pending; details are in [`PATENT_NOTICE.md`](./PATENT_NOTICE.md). The CLI source is BUSL-1.1 with an Additional Use Grant — free for companies under $1M ARR, research, and individuals, auto-converting to Apache 2.0 four years post-release. If you're building a derivative product whose core value depends on the underlying invention, email `founder@sipsalabs.com`.

- **Paid Phase 0 POC** — `founder@sipsalabs.com`, $5K / 5 business days / customer-picked model. Deliverable: a pack plus a Sipsa-run bit-identity + PPL audit on your eval set.
- **GitHub Sponsors** — [github.com/sponsors/sipsalabs](https://github.com/sponsors/sipsalabs).
- **Press / commentary** — `press@sipsalabs.com`.

---

## License

- Released under **BUSL-1.1 with an Additional Use Grant** (free for companies under $1M ARR, research, and individuals; auto-converts to Apache 2.0 four years post-release). See [`LICENSE`](./LICENSE).
- The license grant does **not** extend to the patent-pending compression methodology that produces the artifacts. See [`PATENT_NOTICE.md`](./PATENT_NOTICE.md).
- Pre-compressed model artifacts on HuggingFace carry the upstream teacher model's license plus this project's patent terms.

## Citation

```bibtex
@software{sipsa_ultracompress_2026,
  author = {{Sipsa Labs, Inc.}},
  title  = {UltraCompress: Lossless 5-bit Transformer Compression},
  year   = {2026},
  url    = {https://github.com/sipsalabs/ultracompress}
}
```

## Contact

- Commercial / Phase 0 POC: `founder@sipsalabs.com`
- Security: `security@sipsalabs.com`
- Press: `press@sipsalabs.com`
- HuggingFace: [`huggingface.co/SipsaLabs`](https://huggingface.co/SipsaLabs)
- PyPI: [`pypi.org/project/ultracompress`](https://pypi.org/project/ultracompress/)
