Metadata-Version: 2.4
Name: ultracompress
Version: 0.5.6
Summary: Bit-exact 5-bit pack reconstruction with SHA-256 manifest verification — 22 architectures supported (1.7B-405B, dense + MoE + SSM). Patent-protected codec (USPTO 64/049,511 + 64/049,517). v0.5.x: Apache-2.0; v0.6+: BUSL-1.1 (free for sub-$1M ARR + research) — see https://github.com/sipsalabs/ultracompress/blob/master/LICENSE.
Author-email: "Sipsa Labs, Inc." <founder@sipsalabs.com>
License: BUSL-1.1
Project-URL: Homepage, https://github.com/sipsalabs/ultracompress
Project-URL: Repository, https://github.com/sipsalabs/ultracompress
Project-URL: Issues, https://github.com/sipsalabs/ultracompress/issues
Keywords: model-compression,llm,transformer,fractal,distillation,speculative-decoding
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: LICENSE.apache
License-File: NOTICE.md
Requires-Dist: torch>=2.0
Requires-Dist: numpy>=1.24
Requires-Dist: transformers>=4.40
Provides-Extra: eval
Requires-Dist: lm-eval>=0.4; extra == "eval"
Requires-Dist: datasets>=2.0; extra == "eval"
Requires-Dist: scikit-learn>=1.3; extra == "eval"
Provides-Extra: serve
Requires-Dist: fastapi>=0.100; extra == "serve"
Requires-Dist: uvicorn>=0.20; extra == "serve"
Provides-Extra: all
Requires-Dist: lm-eval>=0.4; extra == "all"
Requires-Dist: datasets>=2.0; extra == "all"
Requires-Dist: scikit-learn>=1.3; extra == "all"
Requires-Dist: fastapi>=0.100; extra == "all"
Requires-Dist: uvicorn>=0.20; extra == "all"
Requires-Dist: scipy>=1.10; extra == "all"
Requires-Dist: matplotlib>=3.7; extra == "all"
Dynamic: license-file

# UltraCompress

Lossless 5-bit transformer compression. Bit-identical reconstruction guaranteed by a SHA-256 manifest.

[![PyPI](https://img.shields.io/badge/pypi-0.5.5-blue.svg)](https://pypi.org/project/ultracompress/0.5.5/)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![Patent](https://img.shields.io/badge/USPTO-64%2F049%2C511%20%2B%2064%2F049%2C517-orange.svg)](./PATENT_NOTICE.md)

Hermes-3-Llama-3.1-405B compressed at 5 bpw lossless: **1.0066x PPL ratio** vs streaming bf16 teacher (5.0692 / 5.0358, n=50, seq_len=1024, FineWeb-edu held-out tail, seed=42). First 405B-class transformer compressed end-to-end on a single 32 GB consumer GPU. Reproduce in 3 commands.

UltraCompress takes a transformer at fp16/bf16 and produces a 5-bit pack you can verify against the original — not "1% PPL drift on WikiText," but a reconstruction the customer can re-derive byte-for-byte from the pack alone. The verifier reconstructs the original weights bit-identically; the SHA-256 manifest fails loudly if anything drifted. That's the honest definition of lossless we care about: an auditor can re-derive every weight from the pack alone, and the verification step is unambiguous pass/fail.

It exists because the bf16-equivalent quality bar matters in places where "good enough on MMLU" isn't enough — defense, FDA-regulated healthcare, SR 11-7 model validation, internal red-team eval at frontier labs. And as a side-effect of the streaming compression path, it lets us put a 405B-parameter model through a single 32 GB consumer GPU without renting an H100 cluster.

We're a small company (Sipsa Labs, Inc.) shipping this in public while the patents are pending. Most days the lab notebook gets longer than the marketing site does. If you want to know what works, what doesn't, and what we tried this week that failed — read on.

---

## Try it (3 commands)

```bash
pip install ultracompress==0.5.5 huggingface_hub[cli]
hf download SipsaLabs/qwen3-1.7b-base-uc-v3-bpw5 --local-dir ./pack
uc verify ./pack
```

Expected output (real, not aspirational — this is what the v0.5.5 verifier prints on a clean pull of the 1.7B-Base artifact):

```
uc_pack_version: 3  (LOSSLESS, self-contained)
codec_source:    trainer-persisted
n_layers:        28
bpw:             5
Spot-check SHA256:
  layer_000.uc:  f87f2aeb3996ab7d…
  layer_014.uc:  …
  layer_027.uc:  …
Layer 0: 7 quantized Linears + 4 extras
All 7 Linear reconstructions have correct shapes.
Bundled scaffold: embed_tokens, model.norm, lm_head present.
→ VERIFY: PASS — bit-identical reconstruction guaranteed.
```

If you also want measured numbers on your hardware (TTFT, steady-state TPS, peak VRAM) — `uc bench ./pack`. Same JSON schema as our published numbers, runs on whatever GPU you have, no Sipsa-side claims to take on faith.

The smallest published artifact is ~1.1 GB. The qwen3-0.6b pack is ~0.4 GB if you want a faster smoke test.

---

## What works today (verified, with JSON receipts)

PyPI `v0.5.5` is the current release. v0.5.5 packs are **self-contained** — they bundle LayerNorm + `embed_tokens` + `lm_head` inside the pack directory, so reproducing a published artifact no longer requires pulling the original bf16 alongside it. ~622 MB auxiliary on top of the compressed body for typical decoder vocab.

**End-to-end validated at 5 bpw across 22 transformer architectures** (dense 0.6B → 405B, MoE 47B → 235B, state-space). Of those, **16 have a verified PPL ratio against their bf16 baseline** on the FineWeb-edu held-out tail at seq_len=1024, seed=42; 6 are still pending eval. Every published number traces to a JSON in `scripts/overlay/artifacts/` or `docs/PPL_EVAL_*.json`.

The headline result and the tightest dense records currently public on HuggingFace:

| Model | Params | Class | PPL ratio | HF artifact | Status |
|---|---|---|---|---|---|
| Hermes-3-Llama-3.1-405B | 405B | First 405B-class lossless on single 32 GB consumer GPU | **1.0066** | [`SipsaLabs/hermes-3-llama-3.1-405b-uc-v3-bpw5`](https://huggingface.co/SipsaLabs/hermes-3-llama-3.1-405b-uc-v3-bpw5) | live |
| Qwen3-1.7B-Base | 1.7B | sub-0.5% drift | **1.00401** | `SipsaLabs/qwen3-1.7b-base-uc-v3-bpw5` | live |
| Qwen3-14B | 14.0B | sub-0.5% drift | **1.00403** | `SipsaLabs/qwen3-14b-uc-v3-bpw5` | live |
| Qwen3-8B | 8.0B | sub-0.5% drift | **1.00440** | `SipsaLabs/qwen3-8b-uc-v3-bpw5` | upload in flight |
| Mixtral-8x7B-v0.1 (MoE) | 47B (13B active) | sub-0.5% drift | **1.00368** | `SipsaLabs/mixtral-8x7b-v0.1-uc-v3-bpw5` | upload in flight |
| Phi-3.5-MoE-instruct | 42B (MoE 16-exp) | sub-0.5% drift | (eval pending this week) | `SipsaLabs/phi-3.5-moe-uc-v3-bpw5` | upload in flight |

Hermes-3-405B is the headline. The 1.0066x ratio is `5.0692 / 5.0358` — both halves of the fraction measured under the same per-layer streaming reconstruction comparator (n=50, seq_len=1024, FineWeb-edu held-out tail, seed=42). The bf16 teacher took 7.7 hours on cuda:1; the compressed pack took 14.3 hours. Pack body is ~251 GB, bit-identical SHA-256 reconstruction. The four Qwen3/Mixtral rows below it are the cleanest sub-1.005× dense/MoE references we have today; Phi-3.5-MoE is the 5th candidate and the eval is queued — number publishes the moment the JSON lands, not before.

Other notable verified results (full table in [Appendix](#appendix-full-architecture-matrix) below):

- **First lossless 5-bit state-space-model compression**: Mamba-2.8B at 1.0119 (codec-only path; the correction-layer path for SSMs hasn't landed yet, see "what doesn't work").
- **HuggingFace presence**: 39+ repos under [`huggingface.co/SipsaLabs`](https://huggingface.co/SipsaLabs).
- **PyPI**: [pypi.org/project/ultracompress/0.5.5](https://pypi.org/project/ultracompress/0.5.5/).

The `SipsaLabs` HuggingFace org page is the live source of truth. If a repo there has files committed, `uc verify` will pass on it after `hf download`.

---

## What doesn't work yet

Things people sometimes assume work because the rest of it does. They don't, and we'd rather you know:

- **Long-context evaluation past seq_len=1024.** Every PPL number above is at seq_len=1024 on the FineWeb-edu held-out tail. We have not yet run controlled evals at 4K/8K/32K context. If your workload depends on long-context behavior, treat the published ratios as "short-context evidence, long-context unmeasured." Eval harness for that lands in v0.6.
- **`uc compress` as a one-shot CLI.** v0.5.5 still requires the manual two-step (`scripts/overlay/stream_compress_e2e.py` then `pack_v3.pack_e2e_dir_v3`). One-shot `uc compress` ships in v0.6.
- **State-space models past the codec-only path.** Mamba-2.8B at 1.0119 is the SSM number, full stop. We tried two paths to add a correction layer on top of it — both made the result worse. The streaming compression runner has to be adapted for SSM-block iteration with real activations to break this; deferred. Documented as failures #1 and #2 in [HONEST_NEGATIVE_RESULTS](docs/HONEST_NEGATIVE_RESULTS_2026_05_08.md).
- **TinyLlama-1.1B-Chat PPL eval.** The pack itself verifies clean (`uc verify` PASS) and the HF artifact uploaded. But the PPL eval forward pass throws a CUDA device-side assert that we haven't traced yet. The matrix shows it as `(deferred)`, not a fabricated number.
- **Qwen3-32B and Llama-3.1-70B PPL ratios.** Both have local `uc verify` PASS; both have stale or suspect baseline PPL numbers we won't republish. Apples-to-apples re-evals at the standard methodology are queued.
- **Below 1.0040× on Qwen3-1.7B-Base.** This is our tightest dense floor. We tried tightening the configuration in 5 different ways this week — three landed within statistical noise, two were catastrophic regressions (1.0682× and 1.1306×). 1.0040× stands as the empirical floor at the current configuration. The next-direction cure has been identified but is not yet validated.
- **HF uploads on residential bandwidth.** Several large-pack uploads (Mixtral-8x22B at 100GB, SmolLM2, Qwen3-0.6B) hit SSL EOF mid-stream. Our 8-attempt watchdog wrapper catches it but multi-hour residential uploads remain brittle. If a `SipsaLabs/...` HF repo shows in-flight in the matrix below, that's why.

---

## Why this isn't AWQ / GPTQ / EXL3

Every other 4–5 bit compression library targets a quality threshold ("sub-1% PPL on WikiText"). UltraCompress targets a **reconstruction contract**: the customer artifact is byte-equivalent on reload to what the trainer measured during distillation, and a SHA-256 manifest covers the pack end-to-end. If anything drifts, `uc verify` fails loudly; you don't have to take "it should be close" on faith.

This matters when "the model picks a slightly-wrong variable name" is a regulatory finding rather than a cosmetic complaint. Defense / aerospace deploy-bit-exactness is a compliance requirement. FDA-regulated healthcare AI requires model equivalence between dev and deploy. SR 11-7 (Federal Reserve model validation) requires reproducible audit recovery. A frontier lab's red-team eval is only valid against the same inference path the team will actually deploy.

For pure-throughput inference on a fixed prompt distribution that matches your AWQ calibration set, with no downstream fine-tuning, AWQ at 4 bpw on vLLM is genuinely fine and we'll say so on a sales call. The Phase 0 POC is structured to find out: bring a model, we deliver a pack, you `uc bench` it on your hardware against your existing AWQ/GPTQ build. If we don't materially help, you keep the diagnostic and we don't push Phase 1.

The competitive intel gory details are in [docs/COMPETITIVE_LANDSCAPE_v3_LOSSLESS_2026_05_08.md](docs/COMPETITIVE_LANDSCAPE_v3_LOSSLESS_2026_05_08.md). The short version: as of 2026-05-09, a search of the public HuggingFace Hub for "5-bit lossless transformer compression" returns 0 results besides ours.

---

## Honest negative results

Most projects hide their failures. We catalogue them at the same level of detail as the wins, in [`docs/HONEST_NEGATIVE_RESULTS_2026_05_08.md`](docs/HONEST_NEGATIVE_RESULTS_2026_05_08.md). 15 entries covering the 2026-05-08 → 2026-05-09 research arc — ratio of catalogued failures to published wins is roughly 15:9 across those two days, and that's the ratio we'd want any external evaluator to use when assessing whether the positive numbers are real. They are.

A taste of what's in there:

- **A correction-layer warm-start applied to Mamba** — made PPL 0.07 pp WORSE than the codec-only baseline. A truncated low-rank projection on a high-rank residual injects noise the activation distribution doesn't want. Documented; the correction layer's value comes from training, not from the warm-start initialization.
- **A multi-pass cascade correction (failure mode #2 in the catalogue)** — hypothesis: two corrections in series capture more than one correction at constant param budget. Result: catastrophic 1.0682× (13.7× worse than uniform single-pass). Pass-1 cannot recover information that pass-0 already discarded. CLOSED — do not re-run.
- **An AWQ-style channel pre-scaling experiment (failure mode #3 in the catalogue)** — 1.1306× catastrophic regression (+13%, 26× worse than uniform). AWQ is designed for uniform-grid quantization where pre-scaling protects salient channels from rounding noise; our codec already adapts a learned non-uniform grid, so the round-trip just injects bias the correction layer then wastes its capacity correcting. CLOSED.
- **Pushing the configuration knobs harder on the Qwen3-1.7B-Base record** — predicted: tighter than 1.0040×. Actual: 1.0042×, within statistical noise. The configuration knob is saturated. The 1.0040× v1 number stands as the empirical floor; the cure is not at the codec, it's at correction-layer capacity allocation by depth.
- **"Base models compress tighter than instruct" hypothesis** — refuted 2/3 of architectures. Instruct-fine-tuning effects on quantization-friendliness are architecture-dependent, not universal. Hypothesis dropped, table published with the data alongside without a hypothesis attached.

Researchers comparing 5-bit codecs should treat that file as the audit trail. It will save you from re-running experiments we already ran, and the LAB-NOTEBOOK entries it cites are the version of record.

---

## Who this is for

Direct, not aspirational:

- **If you serve LLMs in production and your VRAM bill is the constraint**, this might help. Streaming compression keeps peak compression-time VRAM bounded by ~one transformer layer, regardless of total model size (8.98 GB for Qwen2.5-72B; same recipe scales to 405B), and the v3 pack format is bit-exact-reproducible at inference time. Email `founder@sipsalabs.com` with your stack and a target latency/quality bar; we'll tell you honestly whether UC fits.
- **If you're a researcher comparing 5-bit codecs**, the ground-truth JSONs in `scripts/overlay/artifacts/` are the audit trail, the methodology is fixed in `BENCHMARKS_2026_05_09.json`, and the negative results doc above tells you what we already tried that didn't work. The Apache 2.0 license covers reproduction and citation freely.
- **If you're in a regulated domain** (defense, FDA-regulated healthcare, SR 11-7 model validation, frontier lab red-team), the bit-identical reconstruction contract is the actual reason to talk to us. Phase 0 POC ($5K, 5 business days, customer-picked model) gets you a pack you can audit yourself. Cover letter at [`docs/CUSTOMER_PHASE_0_POC_OFFER_LETTER.md`](docs/CUSTOMER_PHASE_0_POC_OFFER_LETTER.md).
- **If you're at a frontier lab** distributing internal model artifacts and want red-team eval fidelity preserved across deploy environments, the SHA-256 manifest exists for exactly that.

If your workload is "MMLU has to stay above X" and you're not pushing the model into long-tail or downstream-fine-tuning territory, AWQ at 4 bpw is probably a better answer than this. We'll say so.

---

## We're a small company looking for design partners

Sipsa Labs, Inc. is a small (currently solo-founder) shop. We filed two USPTO provisional patents in April 2026 (`64/049,511` + `64/049,517`) covering the underlying compression methods, the streaming compression mechanism, and the v3 lossless pack format; a supplement filing lands this week. The patent details are in [`PATENT_NOTICE.md`](./PATENT_NOTICE.md) — short version: Apache 2.0 grants you full use of the published source for any purpose including running it commercially on your own infrastructure, and we'd like a conversation if you're building a derivative product whose core value depends on the underlying invention. Email `founder@sipsalabs.com`.

We're cash-constrained pre-funding. Spending discipline is real: only hard expense booked through end of June is the USPTO conversion fee. That means honest engagement keeps this shipping faster than anything else can:

- **Paid Phase 0 POC** — `founder@sipsalabs.com`, $5K / 5 business days / customer-picked model. The Day 7 deliverable is a pack you can self-verify with `uc verify` + benchmark with `uc bench`. Acceptance gate is `uc verify` PASS + PPL ratio within 1.5% on your eval set. Cadence is documented in [`docs/CUSTOMER_ONBOARDING_v0.5.5_2026_05_09.md`](docs/CUSTOMER_ONBOARDING_v0.5.5_2026_05_09.md).
- **GitHub Sponsors** — [github.com/sponsors/sipsalabs](https://github.com/sponsors/sipsalabs). Keeps the GPU bills paid while the rest of this gets to the next milestone.
- **Press / commentary** — `press@sipsalabs.com`. Most useful framing is "first 5-bit lossless library on the public HF Hub" and "first 405B compression on a single 32 GB consumer GPU" — both verifiable via the artifacts above.
- **Twitter** — `@SipsaLabs`. New account; if you found this repo first that's because we ship faster than we tweet.

If you're tracking the project: the lab notebook at [`docs/LAB-NOTEBOOK.md`](docs/LAB-NOTEBOOK.md) is updated daily and is the canonical "what shipped today" document.

---

## How v3 lossless actually works

`uc pack v3` persists, in the customer artifact, everything required to reconstruct the trainer's measured weights bit-identically: the codec state, per-block scales, bit-packed integer codes, the per-Linear correction layer trained against teacher activations, and (in v0.5.5) the model scaffold (`embed_tokens`, `model.norm`, `lm_head`) bundled inline so the pack is a self-contained model. A SHA-256 manifest covers every layer file.

The reconstruction contract is the point. Because the codec, scales, and correction layer all live in the pack, the inference math at consumer-side load is byte-equivalent to what the trainer measured during distillation — not "close in PPL," but the same numerical result up to fp16 reduction order on the matmul itself. The verifier proves it: `uc verify` reports per-layer pass/fail and does not require a GPU.

The streaming compression path that makes this scale to 405B on one GPU keeps peak VRAM bounded by ~one transformer layer regardless of total depth (8.98 GB for Qwen2.5-72B; same shape for 405B). Compression time is roughly 1 minute per layer.

`scripts/overlay/streaming_compression_runner.py` is the runner. `scripts/overlay/eval_compressed_only.py` is the evaluator that produces the PPL JSONs in `scripts/overlay/artifacts/`. Both are in this repo, both are reproducible.

---

## Repository layout

```
ultracompress/
├── ultracompress/                Core library (pack v3, correction layer, CLI, __main__)
├── scaling/                      Cross-model teacher loaders (Qwen3 / Llama / Mistral / Mamba / OLMo)
├── scripts/overlay/              Streaming compression runner + evaluators + JSON artifacts
├── scripts/frr/                  Research-track architectural compression
├── tests/                        Regression tests
├── docs/
│   ├── HONEST_NEGATIVE_RESULTS_2026_05_08.md      ← the audit trail
│   ├── BENCHMARKS_2026_05_09.json                 ← machine-readable verified records
│   ├── CUSTOMER_ONBOARDING_v0.5.5_2026_05_09.md   ← Phase 0 POC walkthrough
│   ├── PUBLIC_VERIFICATION_DASHBOARD_2026_05_08.md
│   ├── COMPETITIVE_LANDSCAPE_v3_LOSSLESS_2026_05_08.md
│   └── LAB-NOTEBOOK.md                             ← daily research log
└── PATENT_NOTICE.md
```

---

## Appendix: full architecture matrix

22 architectures end-to-end, current state as of 2026-05-10. PPL = FineWeb-edu held-out tail, seq_len=1024, seed=42, against the model's own bf16 baseline on a single RTX 5090. Most rows use n=30 prompts; the 405B row uses n=50 with per-layer streaming reconstruction on both halves of the fraction (apples-to-apples comparator). Sub-baseline OLMo-2-Instruct (0.9998×) is a real measurement — compression appears to act as a faint regularizer at n=30 — not a typo.

| Model | HF artifact | Params | Layers | PPL ratio |
|---|---|---|---|---|
| OLMo-2-0425-1B-Instruct | `olmo-2-0425-1b-instruct-uc-v3-bpw5` | 1.0B | 16 | **0.9998** |
| Phi-3-mini-4k-instruct | `phi-3-mini-4k-instruct-uc-v3-bpw5` | 3.8B | 32 | 1.00262 (caveat: seq_len=128) |
| Mixtral-8x7B-v0.1 (MoE) | `mixtral-8x7b-v0.1-uc-v3-bpw5` | 47B | 32 | **1.00368** |
| Qwen3-1.7B-Base | `qwen3-1.7b-base-uc-v3-bpw5` | 1.7B | 28 | **1.00401** |
| Qwen3-14B | `qwen3-14b-uc-v3-bpw5` | 14.0B | 40 | **1.00403** |
| Yi-1.5-9B | `yi-1.5-9b-uc-v3-bpw5` | 8.8B | — | 1.00414 |
| Qwen3-8B | `qwen3-8b-uc-v3-bpw5` | 8.0B | 36 | **1.00440** |
| Qwen3-0.6B | `qwen3-0.6b-uc-v3-bpw5` | 0.6B | 28 | 1.0069 |
| OLMo-2-0425-1B | `olmo-2-0425-1b-uc-v3-bpw5` | 1.0B | 16 | 1.0073 |
| SmolLM2-1.7B-Instruct | `smollm2-1.7b-instruct-uc-v3-bpw5` | 1.7B | 24 | 1.0075 |
| SmolLM2-1.7B | `smollm2-1.7b-uc-v3-bpw5` | 1.7B | 24 | 1.0085 |
| Mistral-7B-v0.3 | `mistral-7b-v0.3-uc-v3-bpw5` | 7.2B | 32 | 1.0100 |
| Mamba-2.8B (SSM) | `mamba-2.8b-hf-uc-v3-bpw5` | 2.8B | 64 | 1.0119 |
| Llama-3.1-8B | `llama-3.1-8b-uc-v3-bpw5` | 8.0B | 32 | 1.0125 |
| Qwen3-1.7B (Instruct) | `qwen3-1.7b-uc-v3-bpw5` | 1.7B | 28 | 1.0200 |
| Hermes-3-Llama-3.1-405B | `hermes-3-llama-3.1-405b-uc-v3-bpw5` | 405B | 126 | **1.0066** (5.0692 / 5.0358, n=50, per-layer streaming) |
| Qwen3-32B | `qwen3-32b-streaming-bpw5` | 32B | 64 | (re-eval pending) |
| Llama-3.1-70B | `llama-3.1-70b-uc-v3-bpw5` | 70B | 80 | (re-eval pending) |
| Qwen3-235B-A22B (MoE) | `qwen3-235b-a22b-uc-v3-bpw5` | 235B | 94 | (eval pending) |
| Mixtral-8x22B-v0.1 (MoE) | `mixtral-8x22b-v0.1-uc-v3-bpw5` | 141B | 56 | (eval pending) |
| Phi-3.5-MoE-instruct (MoE) | `phi-3.5-moe-uc-v3-bpw5` | 42B | 32 | (eval pending this week) |
| TinyLlama-1.1B-Chat | `tinyllama-1.1b-chat-v1.0-uc-v3-bpw5` | 1.1B | 22 | (CUDA assert in eval harness; pack verifies clean) |

---

## License

- **v0.6+** ships under the [Business Source License 1.1](./LICENSE) with an Additional Use Grant for research, individuals, and companies under $1M ARR. Auto-converts to Apache 2.0 four years after each release. See [NOTICE.md](./NOTICE.md) for the full why.
- **v0.5.x** stays under [Apache License 2.0](./LICENSE.apache) on the `legacy/0.5.x` branch — perpetual, never changing, freely usable. That commitment cannot be revoked.
- Above $1M ARR running v0.6+ in commercial production? `founder@sipsalabs.com`.
- Patent posture: [`PATENT_NOTICE.md`](./PATENT_NOTICE.md). USPTO provisionals `64/049,511` + `64/049,517` filed April 2026.

## Citation

```bibtex
@software{sipsa_ultracompress_2026,
  author = {{Sipsa Labs, Inc.}},
  title  = {UltraCompress: Lossless 5-bit Transformer Compression},
  year   = {2026},
  url    = {https://github.com/sipsalabs/ultracompress}
}
```

## Contact

- Commercial / Phase 0 POC: `founder@sipsalabs.com`
- Security: `security@sipsalabs.com`
- Press: `press@sipsalabs.com`
- HuggingFace: [`huggingface.co/SipsaLabs`](https://huggingface.co/SipsaLabs)
- PyPI: [`pypi.org/project/ultracompress`](https://pypi.org/project/ultracompress/0.5.5/)
- Sponsors: [`github.com/sponsors/sipsalabs`](https://github.com/sponsors/sipsalabs)
