Metadata-Version: 2.4
Name: hadamax
Version: 0.1.1
Summary: Fast compressed ANN search via randomized Hadamard transform and optimal Gaussian scalar quantization. Pure NumPy, no heavy dependencies.
Project-URL: Homepage, https://github.com/stffns/hadamax
Project-URL: Repository, https://github.com/stffns/hadamax
Project-URL: Issues, https://github.com/stffns/hadamax/issues
License: MIT
Keywords: ann,approximate-nearest-neighbor,embeddings,hadamard,lloyd-max,quantization,rag,semantic-search,vector-search
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: numpy>=1.24.0
Description-Content-Type: text/markdown

# hadamax

**Fast compressed approximate nearest-neighbor search.  Pure NumPy.  No heavy dependencies.**

`hadamax` implements the TurboQuant compression pipeline — randomized Hadamard transform followed by optimal Gaussian scalar quantization (Lloyd-Max) — as a self-contained Python library for embedding vector search.  It achieves **8–12× compression** with **>0.92 recall@10** against float32 brute-force, using only NumPy.

```
pip install hadamax
```

---

## Quick start

```python
import numpy as np
from hadamax import HadaMaxIndex

# Build index
idx = HadaMaxIndex(dim=384, bits=4)          # 4-bit, ~8x compression
idx.add_batch(ids=list(range(N)), vectors=embeddings)

# Query
results = idx.search(query_vector, k=10)     # [(id, score), ...]

# Persist
idx.save("my_index.hdmx")
idx2 = HadaMaxIndex.load("my_index.hdmx")   # atomic save, v1/v2 compatible
```

---

## Technical background

### The problem: embedding vectors are expensive

Modern embedding models produce float32 vectors of dimension `d ∈ {384, 768, 1536}`.
Storing N vectors requires `4·N·d` bytes; brute-force search costs `O(N·d)` per query.
For N = 1M, d = 384: **1.5 GB RAM**, with inner products dominating inference time.

Product Quantization (PQ) splits vectors into M sub-vectors and quantizes each
independently. It is effective but requires training a K-means codebook per dataset.
Random Binary Quantization (RaBitQ, 1-bit) is fast but coarse.

**TurboQuant** (Zandieh et al., ICLR 2026, [arXiv:2504.19874](https://arxiv.org/abs/2504.19874))
achieves near-optimal distortion at b bits per coordinate **without training codebooks**,
by first rotating the space with a randomized Hadamard transform to make coordinates
approximately Gaussian, then quantizing each coordinate independently with the
optimal scalar quantizer for N(0,1).

---

### Algorithm

#### Step 1 — Normalize

Given a raw embedding `v ∈ ℝᵈ`, compute the unit vector `v̂ = v / ‖v‖` and store
`‖v‖` separately (float32, 4 bytes per vector).

#### Step 2 — Randomized Hadamard Transform (RHT)

Pad `v̂` to the next power of 2 (`d' = 2^⌈log₂ d⌉`), then apply:

```
x = (1/√d') · H · D · v̂
```

where:
- `D = diag(σ₁, …, σ_d')` — diagonal matrix of i.i.d. ±1 random signs (seed-deterministic)
- `H` — unnormalized Walsh-Hadamard matrix (butterfly pattern)

By the Johnson-Lindenstrauss lemma, each coordinate `xᵢ ≈ N(0, 1/d')`.
After rescaling `x̃ = x · √d'`, the coordinates are approximately `N(0,1)`
regardless of the original distribution of `v`.

**Complexity:** O(d log d) — no matrix multiplication, no codebook training.

#### Step 3 — Lloyd-Max scalar quantization

The optimal scalar quantizer for N(0,1) at b bits partitions ℝ into 2^b intervals
and assigns each the conditional mean as reconstruction value.
These boundaries and centroids are precomputed and hardcoded in `hadamax._codebooks`
(no scipy required at runtime):

| bits | levels | distortion (MSE) | bytes/coord (disk) |
|------|--------|------------------|--------------------|
| 2    | 4      | 0.1175           | 0.25               |
| 3    | 8      | 0.0311           | 0.375              |
| 4    | 16     | 0.0077           | 0.50               |

The quantized vector is stored as a `uint8` index matrix, bit-packed to `b/8`
bytes per coordinate on disk.

#### Step 4 — Approximate inner product

At search time the query `q` is rotated (not quantized) and the approximate
cosine similarity is computed as:

```
score(q, v) = (1/d') · Σᵢ centroid[idx_qᵢ] · centroid[idx_vᵢ]
```

This is a single float16 matrix–vector product against the cached centroid expansions.

---

### TurboQuant_prod: unbiased estimator with QJL correction

The MSE quantizer introduces a small systematic downward bias. The `use_prod=True` mode
corrects this using a **Quantized Johnson-Lindenstrauss (QJL)** residual:

**Build time (per stored vector):**

1. Quantize at `(b-1)` bits MSE, compute residual `r = x̃ - x̃_MSE`
2. Store `sign(S·r)` as a 1-bit vector (int8 ±1 in practice),
   where `S ∈ ℝ^(d'×d')` is a fixed random Gaussian matrix
3. Store `‖r‖ / √d'` (one float32 per vector)

**Query time (correction term):**

```
correctionᵢ = √(π/2) / d' · ‖rᵢ‖ · dot(S·q̂, sign(S·rᵢ))
final_scoreᵢ = mse_scoreᵢ + correctionᵢ
```

This follows from Lemma 4 of Zandieh et al. (2025):
`E[sign(S·r)] = √(2/π) · S·r / ‖S·r‖`, giving an unbiased estimate of `⟨r, q̂⟩`.

**When to use `use_prod=True`:**
- When you need accurate inner product magnitudes (KV-cache, attention approximation)
- **Not** recommended for pure ranking/NNS — the added QJL variance degrades recall@k
  relative to MSE-only at equal total bits

---

### Compression ratios

For N vectors of dimension `d = 384` (BGE-small):

| Backend        | Bytes/vector (disk) | Ratio vs float32 |
|----------------|---------------------|------------------|
| float32        | 1 536               | 1.0×             |
| 4-bit hadamax  | 192 + 4             | **7.9×**         |
| 3-bit hadamax  | 144 + 4             | **10.4×**        |
| 2-bit hadamax  | 96 + 4              | **15.4×**        |
| int8 (naïve)   | 384 + 4             | 3.9×             |

The 4-byte overhead is the per-vector norm. In RAM, indices are stored as uint8
(~3× vs float32); bit-packing applies on disk (~8× vs float32 at 4-bit).

---

### Recall benchmarks

Measured on synthetic unit-sphere vectors (`d=384`, `N=10 000`, 100 queries).
**Baseline: exact cosine float32 brute-force.**

| bits | recall@1 | recall@10 | recall@50 |
|------|----------|-----------|-----------|
| 2    | 0.72     | 0.83      | 0.91      |
| 3    | 0.81     | 0.91      | 0.96      |
| 4    | 0.86     | 0.93      | 0.95      |

Recall improves with clustered (real-world) data. On BGE-small-en embeddings
from mixed document corpora, 4-bit achieves **recall@10 ≈ 0.95**.

> **Note on published results:** The TurboQuant paper (Zandieh et al., 2025) reports
> recall up to 0.99, measured against HNSW graph navigation (not brute-force float32),
> on GloVe `d=200` data, using recall@1 with large `k_probe`. These conditions differ
> from the above; both results are correct under their respective definitions.

---

### File format (`.hdmx`)

```
Offset  Size   Field
──────────────────────────────────────────────────
0       4 B    magic: "HDMX"
4       4 B    version: uint32 (1 or 2)
8       4 B    dim: uint32  — original embedding dimension
12      4 B    bits: uint32 — total bits (2, 3, or 4)
16      4 B    seed: uint32 — rotation seed
20      4 B    n: uint32    — number of stored vectors
24      4 B    flags: uint32 — bit-0: use_prod  [v2 only]
──────────────────────────────────────────────────
28      4 B    packed_len: uint32
32      *      indices: bit-packed uint8 MSE indices
       n×4 B   norms: float32 per-vector original norms
[prod only]
       n×d' B  qjl_signs: int8 sign(S·r) per vector
       n×4 B   rnorms: float32 ‖r‖/√d per vector
──────────────────────────────────────────────────
       n×(2+L) ids: uint16-length-prefixed UTF-8 strings
```

Saves are **atomic** on POSIX: writes to `.hdmx.tmp` then `os.replace()`.
Backward compatible: v1 files (mse-only) load correctly in any version.

---

## API reference

### `HadaMaxIndex(dim, bits=4, seed=0, use_prod=False)`

| Parameter  | Type  | Default | Description |
|------------|-------|---------|-------------|
| `dim`      | int   | —       | Embedding dimension |
| `bits`     | int   | 4       | Bits per coordinate: 2, 3, or 4 |
| `seed`     | int   | 0       | Rotation seed — must be consistent across build and query |
| `use_prod` | bool  | False   | Enable QJL unbiased estimator (requires bits ≥ 3) |

### Methods

```python
idx.add(id, vector)                    # Add one vector
idx.add_batch(ids, vectors)            # Add N vectors (~50x faster than loop)
idx.delete(id) -> bool                 # Remove by id, O(1) lookup
idx.search(query, k=10) -> list        # [(id, score), ...] descending
idx.save(path)                         # Atomic binary save to .hdmx
HadaMaxIndex.load(path)                # Load from .hdmx file
idx.stats() -> dict                    # Compression / memory diagnostics
len(idx)                               # Number of stored vectors
repr(idx)                              # HadaMaxIndex(dim=384, bits=4, mode=mse, n=1000)
```

---

## Relation to TurboQuant / PolarQuant

`hadamax` implements the core compression pipeline from:

> Zandieh, A., Daliri, M., Hadian, A., & Mirrokni, V. (2025).
> **TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate.**
> *ICLR 2026.* [arXiv:2504.19874](https://arxiv.org/abs/2504.19874)

The same algorithm was published concurrently as "PolarQuant" at AISTATS 2026.
Both names were already taken on PyPI; `hadamax` = **Hada**mard + Lloyd-**Max**,
named after its two core operations.

Key contributions of this implementation over the reference:

- **No scipy** — codebooks hardcoded, numpy is the only runtime dependency
- **Batch WHT** — single O(n·d·log d) call for bulk inserts (~50x faster than loop)
- **Float16 cache** — centroid expansions in half precision, ~2x faster matmul
- **O(1) delete** — `_id_to_pos` dict + position compaction
- **Atomic saves** — `.hdmx.tmp` → `os.replace()` pattern
- **Versioned format** — v1/v2 both loadable, forward-compatible flags field

---

## Installation

```bash
pip install hadamax
```

**Requirements:** Python >= 3.10, NumPy >= 1.24.  No other runtime dependencies.

For development:

```bash
git clone https://github.com/stffns/hadamax
cd hadamax
pip install -e .
pytest tests/ -v
```

---

## License

MIT © 2025 Jayson Steffens.

The TurboQuant algorithm is described in [arXiv:2504.19874](https://arxiv.org/abs/2504.19874)
by Zandieh et al. (Google Research / ICLR 2026). This package is an independent implementation.
