Metadata-Version: 2.4
Name: turboqvec
Version: 0.1.0
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Requires-Dist: numpy>=1.24
License-File: LICENSE
Summary: Compressed vector store using PolarQuant — 8-15x smaller float arrays with fast inner product queries
Keywords: vector,quantization,embedding,compression,rust
Author: Manoj Krishnamohan
License: MIT
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/Manojython/turboqvec-rs
Project-URL: Repository, https://github.com/Manojython/turboqvec-rs

# turboqvec-rs

Compressed vector store for Python, backed by Rust. Store large float arrays at **8–15x smaller footprint** and run cosine similarity queries directly on the compressed form — no decompression needed.

Built on [PolarQuant](https://arxiv.org/abs/2504.19874) (TurboQuant Stage 1, Google Research, ICLR 2026). Rust core with PyO3 bindings and Rayon parallel scan.

---

## Install

```bash
pip install turboqvec
```

Build from source:

```bash
git clone https://github.com/Manojython/turboqvec-rs
cd turboqvec-rs
pip install maturin
maturin develop --release
```

---

## Usage

```python
from turboqvec import CompressedVectorStore

store = CompressedVectorStore(dim=384, bits=4)

# Insert — accepts lists or numpy arrays
store.insert(id=0, vector=my_embedding)
store.insert_batch([(id, vec) for id, vec in pairs])  # parallel, faster

# Query — returns (id, score) tuples sorted by cosine similarity
results = store.query(query_embedding, top_k=10)

# Stats
print(f"{len(store):,} vectors, {store.memory_bytes() / 1e6:.1f} MB compressed")
```

See [`notebooks/quickstart.ipynb`](notebooks/quickstart.ipynb) for a full walkthrough with real embeddings.

---

## Accuracy

Validated on 500 BGE-small-en-v1.5 sentence embeddings (d=384, 20newsgroups).

| bits | R@1 | R@5 | R@10 | Mean cosine error | Compression |
|------|-----|-----|------|-------------------|-------------|
| 4-bit | **95%** | 96% | 95% | 0.004 | 7.8x |
| 3-bit | 87% | 89% | 92% | 0.011 | 10.4x |
| 2-bit | 73% | 81% | 85% | 0.031 | 15.4x |

Pearson/Spearman correlation of scores vs true cosine: **0.998 at 4-bit**.

> PolarQuant slightly underestimates cosine similarity (mean drift ≈ -0.003 at 4-bit). Rank order is preserved so top-k retrieval is not affected. For absolute threshold queries (e.g. score > 0.7), add ~0.003 as a calibration offset.

---

## Storage

| n vectors | fp32 | 4-bit | 3-bit | 2-bit |
|-----------|------|-------|-------|-------|
| 1,000 | 1.5 MB | 0.20 MB | 0.15 MB | 0.10 MB |
| 10,000 | 15.4 MB | 1.96 MB | 1.48 MB | 1.00 MB |
| 100,000 | 153.6 MB | 19.6 MB | 14.8 MB | 10.0 MB |
| 1,000,000 | 1,536 MB | 196 MB | 148 MB | 100 MB |

---

## How it works

**Ingestion**

Each vector goes through: `norm → normalise → rotate (R·v̂) → Lloyd-Max quantize → bit-pack`. The rotation matrix R is a fixed random orthogonal matrix (QR decomp) that decorrelates all dimensions so a single codebook fits every coordinate position. Only the norm (4 bytes exact) and the packed bin indices are stored.

**Retrieval**

For a query `q`, we build a lookup table once: `lut[i][b] = (R·q)[i] × centroid[b]`. Then for each database vector, the score is a sum of d table lookups plus one multiply — no matrix math per vector. Rayon parallelises the scan across all N entries.

This is asymmetric distance computation (the same idea FAISS PQ uses). One matrix multiply per query regardless of N.

**Why not QJL (Stage 2)?**

TurboQuant adds a 1-bit residual sketch for KV cache attention vectors. For normalised sentence embeddings the Stage 1 bias is already near zero — adding QJL introduced noise and dropped Pearson correlation from 0.996 to 0.48. Stage 1 alone is the right primitive for standard embeddings.

---

## Repo layout

```
turboqvec-rs/
├── src/               Rust source
│   ├── codebook.rs    Lloyd-Max quantizer, global cache
│   ├── rotation.rs    QR decomposition, global cache
│   ├── encode.rs      encode pipeline + bit packing + Rayon batch
│   ├── decode.rs      decode pipeline (for reconstruction)
│   ├── similarity.rs  asymmetric LUT scoring
│   ├── store.rs       CompressedVectorStore
│   └── py_bindings.rs PyO3 wrapper
├── python/turboqvec/  Python package
├── notebooks/         Quickstart notebook
├── benchmarks/        Comparison scripts vs fp32 baseline
└── tests/             Rust integration tests
```

---

## Related

- [fastrustrag](https://pypi.org/project/fastrustrag/) — MinHash deduplication for RAG pipelines
- [pageindex-rs](https://github.com/Manojython/pageindex-rs) — Hierarchical document indexing for LLMs
- [TurboQuant paper](https://arxiv.org/abs/2504.19874) — Google Research, ICLR 2026

---

## License

MIT

