Metadata-Version: 2.4
Name: ordvec
Version: 0.3.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Rust
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: numpy>=2.2
License-File: LICENSE-APACHE-2.0
License-File: LICENSE-MIT
Summary: Training-free ordinal & sign quantization for compressed vector retrieval
Keywords: vector-search,quantization,nearest-neighbor,ann,simd,ordinal,rank,embeddings
Author: Nelson Spence
License: MIT OR Apache-2.0
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Formalization, https://github.com/Fieldnote-Echo/ordvec-formalization
Project-URL: Homepage, https://github.com/Fieldnote-Echo/ordvec
Project-URL: Issues, https://github.com/Fieldnote-Echo/ordvec/issues
Project-URL: Repository, https://github.com/Fieldnote-Echo/ordvec

# ordvec (Python)

Python bindings for [`ordvec`](https://github.com/Fieldnote-Echo/ordvec) — a
training-free **ordinal & sign** vector-quantization library for compressed
nearest-neighbour retrieval over high-dimensional embeddings. Pure-Rust core,
zero system dependencies; SIMD-accelerated at runtime (AVX-512 / AVX2 / scalar).

```python
import numpy as np
import ordvec

q = ordvec.RankQuant(1024, 2)          # 1024-dim, 2 bits/coord
q.add(np.random.randn(10_000, 1024).astype(np.float32))
# asymmetric: full-precision float queries vs bucketed docs (recommended)
scores, ids = q.search_asymmetric(np.random.randn(8, 1024).astype(np.float32), k=10)
```

## Classes

| Class | Purpose |
|-------|---------|
| `Rank` | Full-precision rank vectors (u16 per coordinate). |
| `RankQuant` | Bucketed ranks, `bits` ∈ {1, 2, 4}; symmetric + asymmetric (float-query LUT) scoring. |
| `Bitmap` | Constant-weight top-bucket bitmap per document; `popcount(Q AND D)` candidate scoring. |
| `SignBitmap` | Sign bitmap for sign-cosine candidate generation; separate from the constant-weight bitmap theorem. |

## Theory and calibration

`Bitmap` exposes the constant-weight top-bucket overlap statistic formalized in
[`ordvec-formalization`](https://github.com/Fieldnote-Echo/ordvec-formalization).
In that finite Lean model, literal bitmap overlap is the query-preserving
quotient statistic, an overlap threshold is Bayes-optimal under explicit
monotone-overlap assumptions, and the idealized uniform constant-weight null
calibrates that threshold by the hypergeometric upper tail.

This is not a deployment guarantee for every encoder or corpus. Real-corpus
recall, monotonicity, and null fit remain empirical diagnostics.

## Installation

```bash
pip install ordvec
```

Wheels target CPython 3.10+ (abi3) and require `numpy>=2.2`. Building from
source needs a Rust toolchain (MSRV 1.89) and
[maturin](https://www.maturin.rs/).

## Provenance & license

The `ordvec` Python bindings were developed within turbovec, factored out
into this standalone package. turbovec
([MIT](https://github.com/RyanCodrai/turbovec), by Ryan Codrai) is credited as
the origin project.

Dual-licensed under either of
[MIT](https://github.com/Fieldnote-Echo/ordvec/blob/main/LICENSE-MIT) or
[Apache-2.0](https://github.com/Fieldnote-Echo/ordvec/blob/main/LICENSE-APACHE-2.0)
at your option.

