Metadata-Version: 2.4
Name: vd
Version: 0.2.8
Summary: A facade over vector databases — one interface, ~15 backends
Project-URL: Homepage, https://github.com/i2mint/vd
License: mit
License-File: LICENSE
Keywords: ANN,embeddings,facade,vector database,vector search
Requires-Python: >=3.10
Requires-Dist: pyyaml>=6.0
Provides-Extra: all-backends
Requires-Dist: chromadb>=0.4.0; extra == 'all-backends'
Requires-Dist: duckdb; extra == 'all-backends'
Requires-Dist: elasticsearch; extra == 'all-backends'
Requires-Dist: faiss-cpu; extra == 'all-backends'
Requires-Dist: lancedb; extra == 'all-backends'
Requires-Dist: milvus-lite; (platform_system != 'Windows') and extra == 'all-backends'
Requires-Dist: numpy; extra == 'all-backends'
Requires-Dist: pgvector; extra == 'all-backends'
Requires-Dist: pinecone; extra == 'all-backends'
Requires-Dist: psycopg[binary]; extra == 'all-backends'
Requires-Dist: pymilvus; extra == 'all-backends'
Requires-Dist: pymongo; extra == 'all-backends'
Requires-Dist: qdrant-client; extra == 'all-backends'
Requires-Dist: redis; extra == 'all-backends'
Requires-Dist: sqlite-vec; extra == 'all-backends'
Requires-Dist: turbopuffer; extra == 'all-backends'
Requires-Dist: weaviate-client; extra == 'all-backends'
Provides-Extra: chroma
Requires-Dist: chromadb>=0.4.0; extra == 'chroma'
Provides-Extra: config
Requires-Dist: tomli-w>=1.0.0; extra == 'config'
Requires-Dist: tomli>=2.0.0; (python_version < '3.11') and extra == 'config'
Provides-Extra: dev
Requires-Dist: chromadb>=0.4.0; extra == 'dev'
Requires-Dist: duckdb; extra == 'dev'
Requires-Dist: faiss-cpu; extra == 'dev'
Requires-Dist: lancedb; extra == 'dev'
Requires-Dist: numpy; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: qdrant-client; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Requires-Dist: sqlite-vec; extra == 'dev'
Requires-Dist: tomli-w>=1.0.0; extra == 'dev'
Provides-Extra: docs
Requires-Dist: sphinx-rtd-theme>=1.0; extra == 'docs'
Requires-Dist: sphinx>=6.0; extra == 'docs'
Provides-Extra: duckdb
Requires-Dist: duckdb; extra == 'duckdb'
Provides-Extra: elasticsearch
Requires-Dist: elasticsearch; extra == 'elasticsearch'
Provides-Extra: embedded
Requires-Dist: chromadb>=0.4.0; extra == 'embedded'
Requires-Dist: duckdb; extra == 'embedded'
Requires-Dist: faiss-cpu; extra == 'embedded'
Requires-Dist: lancedb; extra == 'embedded'
Requires-Dist: numpy; extra == 'embedded'
Requires-Dist: qdrant-client; extra == 'embedded'
Requires-Dist: sqlite-vec; extra == 'embedded'
Provides-Extra: faiss
Requires-Dist: faiss-cpu; extra == 'faiss'
Requires-Dist: numpy; extra == 'faiss'
Provides-Extra: lancedb
Requires-Dist: lancedb; extra == 'lancedb'
Provides-Extra: milvus
Requires-Dist: milvus-lite; (platform_system != 'Windows') and extra == 'milvus'
Requires-Dist: pymilvus; extra == 'milvus'
Provides-Extra: mongodb
Requires-Dist: pymongo; extra == 'mongodb'
Provides-Extra: pgvector
Requires-Dist: pgvector; extra == 'pgvector'
Requires-Dist: psycopg[binary]; extra == 'pgvector'
Provides-Extra: pinecone
Requires-Dist: pinecone; extra == 'pinecone'
Provides-Extra: qdrant
Requires-Dist: qdrant-client; extra == 'qdrant'
Provides-Extra: redis
Requires-Dist: numpy; extra == 'redis'
Requires-Dist: redis; extra == 'redis'
Provides-Extra: sqlite-vec
Requires-Dist: sqlite-vec; extra == 'sqlite-vec'
Provides-Extra: turbopuffer
Requires-Dist: turbopuffer; extra == 'turbopuffer'
Provides-Extra: weaviate
Requires-Dist: weaviate-client; extra == 'weaviate'
Description-Content-Type: text/markdown

# vd

**A facade over vector databases — one Pythonic interface, ~15 backends.**

`vd` lets you operate on any vector database and switch between them with a
one-word change, while keeping each backend's particular power one escape hatch
away. It also helps you *choose* the right backend and *set it up*.

```python
import vd

client = vd.connect("memory")          # switch DB = change this one word
col = client.create_collection("docs")
col["a"] = vd.Document(id="a", text="cats", vector=[0.1, 0.9, 0.0])
col["b"] = vd.Document(id="b", text="pizza", vector=[0.9, 0.0, 0.1])

for hit in col.search([0.1, 0.8, 0.0], limit=2):
    print(hit["id"], hit["score"])
```

## Install

```bash
pip install vd                 # core (zero heavy deps) + the memory backend
pip install vd[chroma]         # + a specific backend's client
pip install vd[embedded]       # + all embedded backends (chroma, qdrant, faiss, …)
pip install vd[all-backends]   # + every backend client
```

The core is near-zero-dependency. Each backend's client library is an optional
extra named after the backend.

## The mental model

`vd` stores and searches **vectors**. Turning text into vectors — *embedding* —
is deliberately **external**: `vd` never embeds on its own. This keeps the
facade honest (most vector DBs do not embed for you) and lightweight.

- **Vector-first.** You hold the embedding model. Hand `vd` `Document`s that
  already carry a `vector`; search with a pre-computed query vector.
- **Text convenience.** Pass an `embedder` (`text -> vector`) to `connect`, and
  then raw text works: `col["k"] = "some text"`, `col.search("a query")`.

With no embedder, passing text raises `EmbeddingRequiredError` — loud, never a
silent wrong-model embedding.

```python
client = vd.connect("chroma", persist_directory="./db", embedder=my_embed_fn)
col = client.create_collection("docs")
col["a"] = "cats and kittens"                  # embedded for you
hits = list(col.search("pets", limit=5))       # query embedded for you
```

## Choosing a backend

`vd` ships a provider registry distilled from a practitioner report
(`misc/docs/11 -- VectorDB Selection & Setup Guide ...md`) and a recommender:

```python
vd.print_recommendation(
    corpus_size="medium", persistence=True, can_run_docker=True,
    cloud_ok=True, budget="free", needs_hybrid=False,
)
vd.print_backends_table()                       # the whole landscape
vd.compare_backends(["chroma", "qdrant", "pgvector"])
```

## Setting a backend up

```python
vd.check_requirements("qdrant")    # diagnoses readiness, prints the next step
vd.setup_guide("qdrant")           # full pip / docker / env-var playbook
vd.install_backend("qdrant")       # the pip command (run=True to install)
```

`check_requirements` is deployment-aware: it checks the pip package for
embedded backends, whether a server answers for self-hosted ones, and the
required environment variables for managed ones — always ending with one
concrete next action.

## The API

| Object | Is a | Plus |
|--------|------|------|
| `Client` (from `connect`) | `Mapping[str, Collection]` | `create_collection`, `get_collection`, `delete_collection`, `get_or_create_collection` |
| `Collection` | `MutableMapping[str, Document]` | `search(...)` |
| `Document` | dataclass | `id`, `text`, `vector`, `metadata` |

```python
col["k"] = vd.Document(id="k", text="…", vector=[...], metadata={"y": 2024})
doc      = col["k"]            # get
del col["k"]                  # delete
"k" in col, len(col), list(col)

col.search(query, *, limit=10, filter=None, egress=None, **backend_kwargs)
```

`search` yields dicts `{"id", "text", "score", "metadata"}` (`score` is
higher-is-better). Transform results with an `egress`: `vd.id_only`,
`vd.id_and_score`, `vd.text_only`, `vd.id_text_score`, or your own.

### Metadata filtering

One backend-agnostic, MongoDB-style filter language — `$eq $ne $gt $gte $lt
$lte $in $nin $exists $and $or $not`:

```python
col.search(qvec, filter={"year": {"$gte": 2020}, "kind": {"$in": ["news", "blog"]}})
```

Each backend declares which operators it honors natively; an unsupported one
raises `UnsupportedFilterError` rather than silently mis-filtering. Backends
with rich native filtering (Qdrant, Pinecone, MongoDB) translate the filter;
the rest apply it client-side with the same semantics.

### Escape hatches

The facade never traps you. `client.client` is the raw backend client;
`collection.native` is the raw backend collection — both supported, documented
API for reaching backend-specific features.

## Backends

| Archetype | Backends |
|-----------|----------|
| **Embedded** (pip-only) | `memory`, `chroma`, `lancedb`, `sqlite_vec`, `duckdb`, `faiss` |
| **Server** (also embedded) | `qdrant`, `weaviate`, `milvus` |
| **Server** | `redis`, `elasticsearch`, `pgvector` |
| **Managed** | `pinecone`, `mongodb` (Atlas), `turbopuffer` |

`vd.list_backends()` shows what is installed and ready now.

## The toolkit

Beyond the facade, `vd` bundles the composite operations people actually do:

- **`vd.search`** — `multi_query_search`, `reciprocal_rank_fusion`,
  `search_similar_to_document`, `deduplicate_results`.
- **`vd.io`** — `export_collection` / `import_collection` (JSONL, JSON,
  directory).
- **`vd.migration`** — `migrate_collection`, `migrate_client`,
  `copy_collection` — move data between *any* two backends.
- **`vd.analytics`** — `collection_stats`, `find_duplicates`, `find_outliers`,
  `validate_collection`.
- **`vd.health`** — `health_check_backend`, `benchmark_search`.
- **`vd.text`** — convenience text cleaning / chunking.
- **`vd.TimeIndexedCollection`** — a time-windowed wrapper over any collection.
- **CLI** — `vd backends`, `vd install`, `vd export/import`, `vd migrate`, …

## AI-agent skills

`vd` ships skills (`vd/data/skills/`) so coding agents can drive it well:
`vd-quickstart`, `vd-backend-choose` (choosing **and** setup), `vd-ingest`,
`vd-search`, `vd-ops`.

## Design

- **Embedding is external.** The core operates on vectors; an `embedder` is an
  injected, optional convenience — never a hard dependency.
- **Two mappings.** A `Client` is a `Mapping` of collections; a `Collection` is
  a `MutableMapping` of documents plus `search`. Idiomatic, minimal, familiar.
- **Thin adapters.** `AbstractClient` / `AbstractCollection` implement
  everything users see; a backend supplies a handful of raw primitives. Adding
  a backend is ~150 lines — see the `vd-add-backend` skill.
- **Capabilities, not a fat base.** Optional features (`SupportsBatch`,
  `SupportsHybrid`) are `@runtime_checkable` protocols you feature-discover.

## License

MIT
