Metadata-Version: 2.4
Name: quanda-vector-db
Version: 0.1.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Database
Summary: WAL-backed vector store for LLM / RAG applications, written in Rust
Keywords: vector-db,embeddings,llm,rag,hnsw,rust
License: MIT
Requires-Python: >=3.8
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM

# quanda-vector-db

A minimal, production-grade vector store for LLM / RAG applications written in Rust with Python bindings.

## Features

- **WAL pattern** — inserts are O(1) file appends, no index rebuild per insert
- **HNSW index** via [`instant-distance`](https://crates.io/crates/instant-distance) — approximate nearest-neighbour search
- **Upsert / delete** — no duplicate IDs; delete is WAL-backed and crash-safe
- **Dimension validation** — stored in snapshot, enforced on every write and query
- **Crash recovery** — on restart, snapshot is loaded and the WAL is replayed

## Installation

```bash
pip install quanda-vector-db
```

## Quick start

```python
from quanda_vector_db import VectorStore, VectorEntry

# Open (or create) a store backed by two files: store.snap + store.wal
store = VectorStore.open("store.snap", "store.wal")

# Upsert — replaces any existing entry with the same id
store.upsert(VectorEntry(id="doc1", embedding=[0.1, 0.2, 0.3], metadata="hello"))
store.upsert(VectorEntry(id="doc1", embedding=[0.9, 0.8, 0.7], metadata="updated"))

# Batch upsert — single WAL write for the whole batch (fast for 100 K+ vectors)
entries = [
    VectorEntry(id=f"doc{i}", embedding=[i * 0.01] * 64, metadata=f"chunk {i}")
    for i in range(10_000)
]
store.upsert_batch(entries)

# Compact: merge WAL → snapshot, rebuild HNSW index once
store.compact()

# Search: returns [(VectorEntry, distance), ...]
results = store.search(query=[0.5] * 64, top_k=5)
for entry, dist in results:
    print(entry.id, entry.metadata, f"dist={dist:.4f}")

# Delete
store.delete("doc1")

# Inspection
print(store.len)        # total live entries
print(store.dim)        # embedding dimension (None if empty)
print(store.wal_pending)  # ops not yet compacted
```

## Write pattern for large datasets

```python
store = VectorStore.open()

# 1. Insert fast — WAL-only, no HNSW rebuild
for batch in batches_of(my_vectors, size=1_000):
    store.upsert_batch(batch)

# 2. Compact once — rebuild HNSW from all entries
store.compact()

# 3. Search many times against the stable index — fast
for query in queries:
    results = store.search(query, top_k=10)
```

## Storage files

| File | Role |
|---|---|
| `store.snap` | Compacted JSON snapshot `{"dim": N, "entries": [...]}` |
| `store.wal` | Append-only log of mutations `{"op": "upsert"/"delete", "data": {...}}` |

## API reference

### `VectorEntry(id, embedding, metadata="", created_at="")`

| Attribute | Type | Description |
|---|---|---|
| `id` | `str` | Unique identifier |
| `embedding` | `list[float]` | Vector |
| `metadata` | `str` | Arbitrary string payload |
| `created_at` | `str` | Timestamp (user-supplied) |

### `VectorStore`

| Method / Property | Description |
|---|---|
| `VectorStore.new(snapshot_path, wal_path)` | Fresh empty store |
| `VectorStore.open(snapshot_path, wal_path)` | Load snapshot + replay WAL |
| `.upsert(entry)` | Insert or replace by ID (WAL append) |
| `.upsert_batch(entries)` | Batch upsert (single WAL write) |
| `.delete(id)` | Delete by ID (WAL append) |
| `.get(id)` | Point lookup, returns `VectorEntry` or `None` |
| `.search(query, top_k=10)` | ANN search, returns `list[(VectorEntry, float)]` |
| `.compact()` | Merge WAL → snapshot, rebuild HNSW |
| `.len` | Number of live entries |
| `.dim` | Embedding dimension or `None` |
| `.wal_pending` | Ops not yet compacted |

## Development

Requires Rust stable and `maturin`:

```bash
pip install maturin
python -m venv .venv && source .venv/bin/activate
maturin develop --release
python -c "import quanda_vector_db; print(quanda_vector_db.__version__)"
```

## License

MIT

