Metadata-Version: 2.4
Name: lsmvec-client
Version: 0.1.0
Summary: Python client for the LSM-Vec vector database HTTP API
Author: LSM-Vec
License: Apache-2.0
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Provides-Extra: numpy
Requires-Dist: numpy>=1.20; extra == "numpy"
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == "dev"

# lsmvec-client — Python client for LSM-Vec

A thin, dependency-free Python client for the LSM-Vec vector database
HTTP API. Uses only the Python standard library; `numpy` is optional
(a convenience for `bulk_build`).

## Install

```bash
pip install lsmvec-client            # core, zero dependencies
pip install lsmvec-client[numpy]     # + numpy for bulk_build convenience
```

Or run straight from the repo without installing:

```python
import sys; sys.path.insert(0, "sdk/python")
from lsmvec_client import Client
```

## Quickstart

```python
from lsmvec_client import Client

client = Client(
    api_key="sk-live-...",                 # sent as Bearer token
    base_url="https://api.lsmvec.com",     # or http://localhost:8000 for local
)

# Insert with optional metadata
client.insert(1, [0.10, 0.20, 0.30, ...], metadata={"title": "intro"})

# Search
hits = client.search([0.10, 0.20, 0.30, ...], k=10)
for h in hits:
    print(h.id, h.distance)

# Filtered search (metadata predicate, same syntax as the HTTP API)
hits = client.search(
    [0.10, 0.20, ...], k=10,
    filter={"$and": [{"category": {"$eq": "docs"}}]},
)
```

## Bulk build (initial load)

The fastest way to populate a **new, empty** database. Builds the
whole index in memory (RNN-Descent) and writes it in one pass —
2-3× faster than per-vector inserts and higher recall. Initial-load
only; the DB must be empty.

```python
import numpy as np
from lsmvec_client import Client

client = Client(base_url="http://localhost:8000")

vectors = np.random.rand(100_000, 128).astype(np.float32)
report = client.bulk_build(vectors, threads=4)
print(report)   # {'n': 100000, 'elapsed_ms': ..., 'vectors_per_sec': ..., 'threads': 4}
```

`bulk_build` also accepts a plain list of equal-length float lists
(no numpy required):

```python
rows = [[0.1, 0.2, 0.3, 0.4], [0.5, 0.6, 0.7, 0.8], ...]
client.bulk_build(rows)
```

For incremental updates on an already-built index, use `insert()` /
`upsert()` instead — `bulk_build` rejects a non-empty DB.

## API

| Method | HTTP | Notes |
|---|---|---|
| `insert(id, vector, metadata=None)` | `POST /v1/vectors` | metadata is any JSON object |
| `upsert(id, vector)` | `PUT /v1/vectors/:id` | insert-or-replace vector |
| `get(id) -> dict` | `GET /v1/vectors/:id` | `{"id", "vector"}` |
| `delete(id)` | `DELETE /v1/vectors/:id` | |
| `get_payload(id) -> dict` | `GET /v1/vectors/:id/payload` | |
| `set_payload(id, payload)` | `PUT /v1/vectors/:id/payload` | replace |
| `merge_payload(id, partial)` | `PATCH /v1/vectors/:id/payload` | RFC 7396 merge |
| `search(vector, k=10, ef_search=None, filter=None) -> [SearchResult]` | `POST /v1/search` | |
| `bulk_build(vectors, dim=None, threads=0) -> dict` | `POST /v1/build/bulk` | empty DB only |
| `stats() -> dict` | `GET /v1/stats` | tombstone / bloom counters |
| `health() -> bool` | `GET /health` | |
| `ready() -> bool` | `GET /ready` | DB open + responsive |

`search` returns a list of `SearchResult(id: int, distance: float)`.

## Errors

HTTP status codes map to typed exceptions (all subclass `LSMVecError`):

| Status | Exception |
|---|---|
| 400 | `InvalidArgument` |
| 401 | `Unauthorized` |
| 404 | `NotFound` |
| 413 | `PayloadTooLarge` |
| 429 | `RateLimited` |
| 5xx | `ServerError` |

```python
from lsmvec_client import NotFound

try:
    client.get(999999)
except NotFound:
    print("no such id")
```

## Notes

- Vectors are stored with 8-bit scalar quantization (SQ8). `get()`
  returns the dequantized vector, which differs from the input by
  up to ~`range/255` per element. Distances and recall are computed
  on the quantized form.
- `id` is a 64-bit unsigned integer.
- The client is synchronous and connection-per-request (stdlib
  `urllib`). For high-throughput batch ingestion, prefer
  `bulk_build` over a loop of `insert`.

## Testing

Against a running server:

```bash
LSMVEC_TEST_URL=http://localhost:8000 LSMVEC_TEST_DIM=8 \
    python3 sdk/python/tests/test_client.py
```
