Metadata-Version: 2.4
Name: bigsmall
Version: 3.14.1
Summary: Lossless AI model compression — make any model 34% smaller with bit-identical weights, drop-in replacement for HuggingFace from_pretrained
Home-page: https://github.com/wpferrell/Bigsmall
Author: Will Ferrell
Author-email: wpferrell@gmail.com
License: Elastic-2.0
Project-URL: Paper, https://doi.org/10.5281/zenodo.20279248
Project-URL: Bug Tracker, https://github.com/wpferrell/Bigsmall/issues
Keywords: neural network,compression,lossless,machine learning,model compression,pytorch,huggingface,transformers,bfloat16,bf16,delta compression,fine-tuning,inference,vram,llm,ai,weights,safetensors,arithmetic coding,entropy coding
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: Other/Proprietary License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Archiving :: Compression
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.24
Requires-Dist: numba>=0.61
Requires-Dist: constriction>=0.4
Requires-Dist: zstandard>=0.21
Requires-Dist: blosc2>=2.0
Requires-Dist: safetensors>=0.4
Requires-Dist: huggingface-hub>=0.20
Requires-Dist: tqdm>=4.0
Provides-Extra: torch
Requires-Dist: torch>=2.0; extra == "torch"
Provides-Extra: hf
Requires-Dist: transformers>=4.30; extra == "hf"
Requires-Dist: huggingface-hub>=0.20; extra == "hf"
Provides-Extra: diffusion
Requires-Dist: diffusers>=0.20; extra == "diffusion"
Provides-Extra: vllm
Requires-Dist: vllm>=0.4; extra == "vllm"
Provides-Extra: ecc
Requires-Dist: reedsolo>=1.7; extra == "ecc"
Provides-Extra: all
Requires-Dist: torch>=2.0; extra == "all"
Requires-Dist: transformers>=4.30; extra == "all"
Requires-Dist: diffusers>=0.20; extra == "all"
Requires-Dist: huggingface-hub>=0.20; extra == "all"
Requires-Dist: reedsolo>=1.7; extra == "all"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: license-file
Dynamic: project-url
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

[![PyPI version](https://img.shields.io/pypi/v/bigsmall.svg)](https://pypi.org/project/bigsmall/)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20279248.svg)](https://doi.org/10.5281/zenodo.20279248)
[![License](https://img.shields.io/badge/license-Elastic--2.0-blue.svg)](https://github.com/wpferrell/Bigsmall/blob/main/LICENSE)
[![Python](https://img.shields.io/pypi/pyversions/bigsmall.svg)](https://pypi.org/project/bigsmall/)
[![Downloads](https://static.pepy.tech/badge/bigsmall)](https://pepy.tech/project/bigsmall)

# BigSmall — Lossless AI Model Compression

**Make any AI model ~34% smaller. Bit-identical weights. Drop-in replacement for `from_pretrained`.**

```bash
pip install bigsmall
```

A 14 GB Mistral-7B becomes 9.3 GB. A fine-tuned model becomes a 5 GB patch on top of its 14 GB base. The decompressed model is **every weight bit-for-bit identical** to the original — md5-verified on every tensor.

| **~34%** smaller | **~65%** smaller as a delta patch | **25+** ready-to-use models |
|:---:|:---:|:---:|
| any BF16 LLM | fine-tunes vs their base | [on HuggingFace](https://huggingface.co/wpferrell) |

---

## What BigSmall does

Three use cases. Pick the one that fits.

### 1. Make any model smaller

```bash
bigsmall compress mistral-7b/ -o mistral-7b.bs
bigsmall decompress mistral-7b.bs -o mistral-7b-restored/
```

**Before:** 14.2 GB of safetensors. **After:** 9.3 GB `.bs` file. **Saved:** 4.9 GB (34%).

Every weight is bit-for-bit identical. Every calculation the model does is identical to the original. Works on any safetensors model — LLMs, diffusion, audio, vision, anything.

### 2. Store fine-tunes as tiny patches

```bash
bigsmall compress qwen-instruct/ --delta-from qwen-base/ -o instruct.bs
bigsmall apply qwen-base/ instruct.bs -o qwen-instruct-restored/
```

**Before:** 14.2 GB Qwen2.5-7B-Instruct. **After:** ~5 GB patch. **Saved:** 9 GB (65%).

If your users already have the public base model, they only need to download what *changed*. This is the biggest win in BigSmall. Use it for any fine-tune: instruction tuning, DPO, RLHF, domain adaptation, LoRA-merged checkpoints.

### 3. Download smaller, use instantly

```python
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "wpferrell/phi-3.5-mini-instruct-bigsmall"
)
```

Works exactly like a normal HuggingFace model — BigSmall decompresses transparently on load. **25+ pre-compressed models** ready to use ([browse them all](https://huggingface.co/wpferrell)).

Prefer the CLI?

```bash
bigsmall decompress wpferrell/phi-3.5-mini-instruct-bigsmall -o phi-3.5-mini/
```

---

## Compression numbers (every published model)

Every row is a real measurement. Click a model to download it.

| Model | Original | BigSmall | Saved |
|---|---:|---:|---:|
| [Qwen2.5-32B-Instruct](https://huggingface.co/wpferrell/qwen2.5-32b-instruct-bigsmall) | 61.0 GB | 40.3 GB | 34% |
| [Gemma-3-27B-it](https://huggingface.co/wpferrell/gemma-3-27b-it-bigsmall) | 51.1 GB | 33.4 GB | 35% |
| [Qwen2.5-14B-Instruct](https://huggingface.co/wpferrell/qwen2.5-14b-instruct-bigsmall) | 29.5 GB | 19.5 GB | 34% |
| [Gemma-3-12B-it](https://huggingface.co/wpferrell/gemma-3-12b-it-bigsmall) | 22.7 GB | 14.8 GB | 35% |
| [Gemma-2-9B-it](https://huggingface.co/wpferrell/gemma-2-9b-it-bigsmall) | 17.2 GB | 11.3 GB | 34% |
| [Llama-3.1-8B-Instruct](https://huggingface.co/wpferrell/llama-3.1-8b-instruct-bigsmall) | 15.0 GB | 9.7 GB | 35% |
| [Llama-3-8B-Instruct](https://huggingface.co/wpferrell/llama-3-8b-instruct-bigsmall) | 15.0 GB | 9.8 GB | 34% |
| [Qwen3-8B](https://huggingface.co/wpferrell/qwen3-8b-bigsmall) | 15.3 GB | 10.1 GB | 34% |
| [Mistral-7B-Instruct v0.3](https://huggingface.co/wpferrell/mistral-7b-instruct-bigsmall) | 14.2 GB | 8.9 GB | 37% |
| [Mistral-7B-Instruct v0.2](https://huggingface.co/wpferrell/mistral-7b-instruct-v0.2-bigsmall) | 14.2 GB | 8.9 GB | 37% |
| [Qwen2.5-7B-Instruct](https://huggingface.co/wpferrell/qwen2.5-7b-instruct-bigsmall) | 14.2 GB | 9.4 GB | 34% |
| [Phi-3.5-mini-instruct](https://huggingface.co/wpferrell/phi-3.5-mini-instruct-bigsmall) | 7.1 GB | 4.7 GB | 34% |
| [Gemma-3-4B-it](https://huggingface.co/wpferrell/gemma-3-4b-it-bigsmall) | 8.0 GB | 5.2 GB | 35% |
| [Qwen3-4B-Instruct](https://huggingface.co/wpferrell/qwen3-4b-instruct-bigsmall) | 7.5 GB | 5.0 GB | 34% |
| [Llama-3.2-3B-Instruct](https://huggingface.co/wpferrell/llama-3.2-3b-instruct-bigsmall) | 6.4 GB | 3.9 GB | 39% |
| [Gemma-2-2B-it](https://huggingface.co/wpferrell/gemma-2-2b-it-bigsmall) | 4.9 GB | 3.2 GB | 34% |
| [Qwen2.5-3B-Instruct](https://huggingface.co/wpferrell/qwen2.5-3b-instruct-bigsmall) | 5.7 GB | 3.8 GB | 34% |
| [Qwen2.5-1.5B-Instruct](https://huggingface.co/wpferrell/qwen2.5-1.5b-instruct-bigsmall) | 2.9 GB | 1.9 GB | 34% |
| [Llama-3.2-1B-Instruct](https://huggingface.co/wpferrell/llama-3.2-1b-instruct-bigsmall) | 2.3 GB | 1.5 GB | 34% |
| [Gemma-3-1B-it](https://huggingface.co/wpferrell/gemma-3-1b-it-bigsmall) | 1.9 GB | 1.2 GB | 35% |
| [Qwen2.5-0.5B-Instruct](https://huggingface.co/wpferrell/qwen2.5-0.5b-instruct-bigsmall) | 920 MB | 610 MB | 34% |
| [GPT-2 (117M)](https://huggingface.co/wpferrell/gpt2-bigsmall) | 548 MB | 414 MB | 24% |
| [Gemma-3-270M-it](https://huggingface.co/wpferrell/gemma-3-270m-it-bigsmall) | 500 MB | 330 MB | 34% |
| [Gemma-3-270M](https://huggingface.co/wpferrell/gemma-3-270m-bigsmall) | 500 MB | 330 MB | 34% |
| [Gemma-2-2B](https://huggingface.co/wpferrell/gemma-2-2b-bigsmall) | 9.7 GB | 8.1 GB | 17% |

[Browse all 25+ models on HuggingFace →](https://huggingface.co/wpferrell)

---

## What "lossless" actually means

Every weight in the model is **mathematically identical** to the original — same bit pattern, same floating-point value, same gradient, same output.

- **Not quantization.** Quantization rounds weights to fewer bits and the model's behaviour changes.
- **Not pruning.** Pruning deletes weights.
- **Not approximation.** No tricks, no calibration data, no quality drop.

BigSmall finds redundancy in the bit pattern of neural weights and stores it more compactly — the same idea as ZIP for text, but tuned for BF16 floating-point distributions. **md5 is verified on every tensor** at decompression. If a single bit differs, verify fails.

---

## CLI reference

```
bigsmall compress SRC [-o OUT] [--delta-from BASE] [--auto-delta] [--resume] [--ecc]
bigsmall decompress SRC [-o OUT] [--base BASE]
bigsmall info SRC.bs                       size, ratio, codecs used
bigsmall scan SRC                          analyse before compressing
bigsmall verify SRC.bs [--fast|--sample N] integrity check
bigsmall diff A.bs B.bs [--patch P.bs]     compare or write a delta
bigsmall apply BASE PATCH.bs -o OUT        reconstruct from base + patch
bigsmall repair SRC.bs [-o OUT]            recover via Reed-Solomon ECC sidecar
bigsmall benchmark SRC                     encode/decode throughput
bigsmall migrate SRC.bs                    re-encode with current codecs
bigsmall status                            list your BigSmall HF repos
bigsmall pipeline run SRC DST              resumable download → compress → upload
```

Every command has `--help`. See [docs/cli-reference.md](docs/cli-reference.md) for full examples.

---

## Python API

```python
import bigsmall

# Round-trip a model
bigsmall.compress("model/", "model.bs")
bigsmall.decompress("model.bs", "model_back/")

# Fine-tune as a delta patch
bigsmall.compress("finetune/", "patch.bs", delta_from="base/")
bigsmall.apply("base/", "patch.bs", "finetune_back/")

# Inspect before compressing
bigsmall.detect_bf16_native("model/")
bigsmall.scan_model("model/")

# Low-VRAM streaming inference (~12× less VRAM than from_pretrained)
from bigsmall import BigSmallStreamingModel
model = BigSmallStreamingModel.from_pretrained(
    "wpferrell/phi-3.5-mini-instruct-bigsmall",
    device="cuda",
    lru_max_vram_gb=2.0,
)
```

---

## What's new in v3.13

- **Delta compression** — fine-tunes are now ~34% of full model size as a patch on the base.
- **Auto-detect the base** — `--auto-delta` finds the base by fingerprint so you don't have to.
- **BF16-native F32 detection** — F32 models that are secretly BF16 (Whisper, several HF checkpoints) now compress 40%+ better automatically.
- **Resume** — interrupted compression picks up exactly where it left off (`--resume`).
- **Fast verify** — `bigsmall verify --fast` checks integrity in seconds; `--sample 0.001` catches in-blob corruption without a full decode.
- **mmap decode** — large `.bs` files (>256 MB) memory-map instead of fully reading into RAM.
- **Reed-Solomon ECC** — `--ecc` writes a parity sidecar; `bigsmall repair` fixes bit-rot.
- **New CLI commands** — `info`, `scan`, `diff`, `apply`, `repair`.
- **Streaming LRU** — `BigSmallStreamingModel(lru_max_vram_gb=2.0)` keeps hot layers in VRAM.

[Full changelog →](CHANGELOG.md)

---

## Research

The lossless compression ceiling for BF16 neural weights has been measured. It is **~62% of raw BF16 for any model**, **~34% for fine-tunes** with delta compression. We ran 300+ experiments across every known mathematical approach — entropy coding, cross-tensor prediction, learned translators, persistent homology, optimal transport, quantum-inspired methods, and more — and proved that there is no further compression available within the strict bit-identity contract.

Full findings, all experiments, all dead-ends: **[10.5281/zenodo.20279248](https://doi.org/10.5281/zenodo.20279248)**. Plain-English summary: [docs/research.md](docs/research.md).

---

## Install

```bash
pip install bigsmall                  # core
pip install "bigsmall[hf]"            # + HuggingFace integration
pip install "bigsmall[ecc]"           # + Reed-Solomon error recovery
pip install "bigsmall[all]"           # everything
```

**Requires** Python 3.9+. Works on Linux, macOS, and Windows. CPU, NVIDIA, AMD, and Apple Silicon.

---

## License

Code: [Elastic License 2.0](LICENSE). Free for personal, research, and commercial use. SaaS providers should see [LICENSING.md](LICENSING.md).

Model weights distributed in `.bs` format keep the license of the original model.

---

## Links

- **PyPI** — https://pypi.org/project/bigsmall/
- **GitHub** — https://github.com/wpferrell/Bigsmall
- **HuggingFace** — https://huggingface.co/wpferrell
- **Paper / DOI** — https://doi.org/10.5281/zenodo.20279248
- **Docs** — [docs/](docs/)
- **Changelog** — [CHANGELOG.md](CHANGELOG.md)
- **Contact** — wpferrell@gmail.com
