Metadata-Version: 2.4
Name: bitbudget
Version: 0.2.0
Summary: How much retrieval quality do you keep per byte? A reproducible benchmark for embedding compression.
Author: Sean Moran
License: MIT
Project-URL: Paper, https://arxiv.org/abs/2510.04127
Project-URL: Leaderboard, https://github.com/sjmoran/bitbudget/blob/main/LEADERBOARD.md
Keywords: retrieval,embeddings,quantisation,hashing,compression,ANN,RAG
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.21
Provides-Extra: embed
Requires-Dist: sentence-transformers>=2.2; extra == "embed"
Provides-Extra: faiss
Requires-Dist: faiss-cpu>=1.7.4; extra == "faiss"
Provides-Extra: all
Requires-Dist: sentence-transformers>=2.2; extra == "all"
Requires-Dist: faiss-cpu>=1.7.4; extra == "all"
Dynamic: license-file

# BitBudget

**How much retrieval quality do you keep per byte?**

BitBudget is a small, reproducible benchmark for **embedding compression**. Give it an
embedder and a corpus and it reports the retrieval quality (nDCG@10, recall@10) that each
compression method retains against the **bytes it stores per vector** — the recall‑per‑byte
frontier that every RAG and vector‑database deployment actually lives on.

It is the companion benchmark to the survey *“Projection and Quantisation: A Unifying View of
Learning to Hash, from Random Projections to the RAG Era”* and exists to answer one question
that today is mostly answered by vendor blog posts: **when you binarise / int8 / RaBitQ /
product‑quantise / Matryoshka‑truncate your embeddings, what do you actually lose?**

## The headline finding

> **Bits beat dimensions.** Spending a fixed byte budget on *more coarsely quantised*
> coordinates beats spending it on *fewer full‑precision* coordinates, at every budget and
> for every embedder we have tried. One‑bit codes with a cheap re‑ranking pass are **32×
> smaller than float at no measurable loss**.

```
mxbai‑embed‑large (1024‑d), mean over 4 BEIR corpora
  binary+rerank      128 B   nDCG 0.509   100% of float   ← 32× smaller, lossless
  pq                 128 B   nDCG 0.488    96%
  rabitq             128 B   nDCG 0.487    96%
  matryoshka        1024 B   nDCG 0.439    86%             ← 4× smaller, projection axis
  float32           4096 B   nDCG 0.508   100%
```

See **[LEADERBOARD.md](LEADERBOARD.md)** for the full table.

## Install

```bash
pip install bitbudget            # evaluation only (numpy)
pip install "bitbudget[all]"     # + sentence-transformers (embedding) + faiss
```

## Quickstart

```bash
bitbudget methods                                   # list compression methods
bitbudget run --embedder mxbai --corpus scifact     # embed + evaluate, print a results card
bitbudget leaderboard results/card_*.json           # render a markdown leaderboard

bitbudget indexes                                   # list indexes (organisation axis)
bitbudget bench-index --synthetic 100000 128        # recall vs QPS vs bytes: flat/hnsw/ivfpq/bittrie
```

`run` embeds (torch) and evaluates (numpy) in one process. The corpora auto‑download.

### The organisation axis (`bench-index`)

The compression leaderboard answers *quality per byte*; `bench-index` answers the orthogonal
*recall per query-second*. It builds an index over the document vectors and reports recall@k,
throughput (QPS) and bytes per vector, so HNSW and IVF‑PQ (which buy throughput and *add* bytes)
can be compared against compact‑code indexes on one frontier. Run it on synthetic data, on a
cached embedding (`--embedder mxbai --corpus scifact`), or on your own vectors (`--npz`). The
faiss‑backed indexes need `pip install bitbudget[faiss]`; the numpy `bittrie` runs without it.

The `bittrie` index ships a small C kernel (`_bittrie.c`) for the query hot‑path, compiled on
first use and cached (no compiler needed to *install* — the wheel stays pure‑Python, and it falls
back to numpy if no compiler is present). It builds **multithreaded** when OpenMP is available
(GCC/clang on Linux, Homebrew `libomp` on macOS) and single‑threaded otherwise; results are
bit‑identical to the numpy path, and recall/footprint are algorithmic and unchanged either way.

Because faiss carries its own OpenMP runtime, it cannot share a process with the bit‑trie's
`libomp` on macOS. `bench-index` therefore runs the faiss indexes and the bit‑trie in **separate
subprocesses** and merges the results, so a single `bitbudget bench-index ...` works everywhere
(pass `--no-split` to force one process, e.g. on Linux where both share one OpenMP runtime).

> **macOS note.** torch and faiss each bundle their own OpenMP runtime and crash if imported
> in the same process. The core methods are numpy‑only, so `run` is safe; if you add a
> faiss‑backed method, run `bitbudget embed` (torch) and `bitbudget eval` (numpy/faiss)
> as separate processes.

## The protocol (frozen, so results are comparable)

- **Corpora:** the BEIR subsets `scifact`, `nfcorpus`, `arguana`, `fiqa` (small enough to run
  on a laptop, diverse enough to be honest). Numbers are the mean over corpora; `±` is the
  standard deviation across them.
- **Metrics:** `nDCG@10` against the graded BEIR judgements, and `recall@10` against the exact
  floating‑point neighbours. `% of float` is nDCG relative to the uncompressed embedding.
- **Memory:** bytes stored per document vector (`4D` float, `D` int8, `D/8` binary, `M` for an
  `M`‑byte product code, `4·dim` for a truncated/PCA‑reduced vector).
- **Embedders:** `minilm` (384‑d) and `mxbai` (1024‑d, Matryoshka) ship built in.

## Add your method in five lines

This is the point of the benchmark: drop in your compressor and it is scored against every
built‑in on the same protocol.

```python
from bitbudget import method
import numpy as np

@method("my-2bit", bits=2)
def my_2bit(demb, qemb):
    codes = my_quantise(demb)                       # your compression
    scores = qemb @ my_reconstruct(codes).T         # (queries x docs) similarity
    return scores, demb.shape[1] * 2 / 8            # scores, bytes per stored vector
```

```bash
bitbudget run --embedder mxbai --corpus scifact --methods my-2bit binary+rerank float32
```

Then open a pull request adding your row to [LEADERBOARD.md](LEADERBOARD.md). See
[CONTRIBUTING.md](CONTRIBUTING.md).

## Cite

If BitBudget helps your work, please cite the survey:

```bibtex
@article{moran2025projection,
  title   = {Projection and Quantisation: A Unifying View of Learning to Hash,
             from Random Projections to the RAG Era},
  author  = {Moran, Sean},
  journal = {arXiv preprint arXiv:2510.04127},
  year    = {2025}
}
```

MIT licensed.
