Metadata-Version: 2.4
Name: vectrixdb
Version: 1.9.2
Summary: Where vectors come alive - A lightweight, visual-first vector database with embedded ML models
Project-URL: Homepage, https://github.com/knowusuboaky/VectrixDB
Project-URL: Documentation, https://github.com/knowusuboaky/VectrixDB#readme
Project-URL: Repository, https://github.com/knowusuboaky/VectrixDB
Author-email: Kwadwo Daddy Nyame Owusu - Boakye <kwadwo.owusuboakye@outlook.com>
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: ai,approximate-nearest-neighbors,embeddings,hnsw,machine-learning,similarity-search,vector-database
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Requires-Dist: aiosqlite>=0.19.0
Requires-Dist: fastapi>=0.109.0
Requires-Dist: httpx>=0.26.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: onnxruntime>=1.15.0
Requires-Dist: orjson>=3.9.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: rich>=13.0.0
Requires-Dist: tokenizers>=0.15.0
Requires-Dist: typer>=0.9.0
Requires-Dist: usearch>=2.0.0
Requires-Dist: uvicorn[standard]>=0.27.0
Requires-Dist: websockets>=12.0
Provides-Extra: all
Requires-Dist: black>=23.0.0; extra == 'all'
Requires-Dist: fastembed>=0.2.0; extra == 'all'
Requires-Dist: mypy>=1.0.0; extra == 'all'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'all'
Requires-Dist: pytest-cov>=4.0.0; extra == 'all'
Requires-Dist: pytest>=7.0.0; extra == 'all'
Requires-Dist: ruff>=0.1.0; extra == 'all'
Requires-Dist: scikit-learn>=1.3.0; extra == 'all'
Requires-Dist: sentence-transformers>=2.2.0; extra == 'all'
Requires-Dist: umap-learn>=0.5.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: black>=23.0.0; extra == 'dev'
Requires-Dist: mypy>=1.0.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: embeddings
Requires-Dist: fastembed>=0.2.0; extra == 'embeddings'
Requires-Dist: sentence-transformers>=2.2.0; extra == 'embeddings'
Provides-Extra: fastembed
Requires-Dist: fastembed>=0.2.0; extra == 'fastembed'
Provides-Extra: hf
Requires-Dist: sentence-transformers>=2.2.0; extra == 'hf'
Provides-Extra: setup-models
Requires-Dist: optimum[onnxruntime]>=1.12.0; extra == 'setup-models'
Requires-Dist: torch>=2.0.0; extra == 'setup-models'
Requires-Dist: transformers>=4.30.0; extra == 'setup-models'
Provides-Extra: viz
Requires-Dist: scikit-learn>=1.3.0; extra == 'viz'
Requires-Dist: umap-learn>=0.5.0; extra == 'viz'
Description-Content-Type: text/markdown

# VectrixDB

[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Python Versions](https://img.shields.io/pypi/pyversions/vectrixdb.svg)](https://pypi.org/project/vectrixdb/)
[![VectrixDB Version](https://img.shields.io/pypi/v/vectrixdb.svg)](https://pypi.org/project/vectrixdb/)
[![Downloads](https://pepy.tech/badge/vectrixdb)](https://pepy.tech/project/vectrixdb)
[![Issues](https://img.shields.io/github/issues/knowusuboaky/VectrixDB)](https://github.com/knowusuboaky/VectrixDB/issues)
[![Contact](https://img.shields.io/badge/Email-Contact-green.svg)](mailto:kwadwo.owusuboakye@outlook.com)

**Where vectors come alive.**

A lightweight vector database with embedded ML models, beautiful dashboard, and GraphRAG - no API keys required.

---

## Features

- **4 Search Modes** - Dense, Hybrid, Ultimate, and Graph (GraphRAG)
- **Embedded Models** - Works offline with bundled ONNX models
- **Model Selection** - Choose from bundled, HuggingFace, or GitHub release models
- **Visual Dashboard** - Built-in web UI for managing collections
- **Zero Config** - Just `pip install` and start using

## Installation

```bash
pip install vectrixdb
```

## Quick Start

```python
from vectrixdb import Vectrix

db = Vectrix("my_docs")
db.add(["Python is great", "JavaScript powers the web", "Rust is fast"])

results = db.search("programming")
print(results.top.text)
```

## Search Modes

VectrixDB offers 4 search modes, each building on the previous:

| Mode | Components | Best For |
|------|------------|----------|
| `dense` | Vector similarity | Fast semantic search |
| `hybrid` | Dense + Sparse + Reranker | Keyword + semantic matching |
| `ultimate` | Hybrid + ColBERT | Maximum accuracy |
| `graph` | Ultimate + Knowledge Graph | Complex reasoning (GraphRAG) |

```python
# Choose your mode
db = Vectrix("docs", mode="dense")     # Fastest
db = Vectrix("docs", mode="hybrid")    # Balanced
db = Vectrix("docs", mode="ultimate")  # Best quality
db = Vectrix("docs", mode="graph")     # GraphRAG
```

## Model Selection

Customize models for each component. Models load from 3 sources:

### 1. Bundled Models (Offline)

Pre-packaged ONNX models that work without internet (~100MB total):

```python
db = Vectrix(
    "docs",
    mode="ultimate",
    dense_model="e5-small",
    sparse_model="bm25",
    reranker_model="L12",
    late_interaction_model="colbert",
)
```

| Component | Alias | Model | Size |
|-----------|-------|-------|------|
| Dense | `e5-small` | intfloat/e5-small-v2 | 33MB |
| Sparse | `bm25` | BM25 vocabulary | 1MB |
| Reranker | `L12` | ms-marco-MiniLM-L12-v2 | 33MB |
| ColBERT | `colbert` | answerai-colbert-small-v1 | 33MB |

### 2. HuggingFace Models

Use any compatible model from HuggingFace (downloads on first use):

```python
db = Vectrix(
    "docs",
    mode="hybrid",
    dense_model="BAAI/bge-large-en-v1.5",
    sparse_model="naver/splade-cocondenser-ensembledistil",
    reranker_model="cross-encoder/ms-marco-MiniLM-L-12-v2",
)
```

**Compatible models:**
- Dense: `BAAI/bge-large-en-v1.5`, `intfloat/e5-large-v2`, `sentence-transformers/all-mpnet-base-v2`
- Sparse: `naver/splade-cocondenser-ensembledistil`
- Reranker: `cross-encoder/ms-marco-MiniLM-L-12-v2`, `BAAI/bge-reranker-base`
- ColBERT: `jinaai/jina-colbert-v2`, `colbert-ir/colbertv2.0`

### 3. GitHub Release Models

Larger models hosted on GitHub releases (auto-downloaded on first use):

```python
db = Vectrix(
    "docs",
    mode="ultimate",
    dense_model="github:bge-small",
    sparse_model="github:splade",
    reranker_model="github:reranker-l6",
    late_interaction_model="github:bge-m3",
)
```

| Tag | Model | Type | Languages | Size |
|-----|-------|------|-----------|------|
| `github:bge-small` | BAAI/bge-small-en-v1.5 | Dense | EN | 127MB |
| `github:e5-small` | intfloat/e5-small-v2 FP32 | Dense | EN | 127MB |
| `github:dense-multi` | multilingual-e5-small | Dense | 100+ | 113MB |
| `github:splade` | SPLADE++ | Sparse | EN | 508MB |
| `github:reranker-l6` | ms-marco-MiniLM-L6-v2 | Reranker | EN | 87MB |
| `github:reranker-multi` | mMiniLMv2-L12 | Reranker | 15+ | 113MB |
| `github:bge-m3` | BGE-M3 | ColBERT | 100+ | 563MB |

## Metadata & Filtering

```python
db.add(
    texts=["iPhone 15", "Galaxy S24", "Pixel 8"],
    metadata=[
        {"brand": "Apple", "price": 999},
        {"brand": "Samsung", "price": 899},
        {"brand": "Google", "price": 699}
    ]
)

results = db.search("smartphone", filter={"brand": "Apple"})
```

## Storage Backends

Use external storage backends (Lakebase, DeltaLake, CosmosDB) with full search mode support:

```python
from vectrixdb import Vectrix, VectrixDB

# Connect to Lakebase (PostgreSQL + pgvector)
lakebase = VectrixDB.with_lakebase(
    host="your-lakebase-host",
    database="vectrixdb",
    user="your-user",
    password="your-password",
)

# Use Vectrix with storage backend + ultimate mode
db = Vectrix(
    "products",
    mode="ultimate",
    dense_model="bge-small",
    sparse_model="splade",
    reranker_model="L6",
    late_interaction_model="colbert",
    storage_backend=lakebase,
)

db.add(texts=["Product A", "Product B"])
results = db.search("query")  # Full ultimate search from Lakebase
```

### Adaptive Schema

Schema adapts based on selected mode:

| Mode | Columns Created |
|------|-----------------|
| `dense` | `dense_embedding` |
| `hybrid` | `dense_embedding` + `sparse_embedding` |
| `ultimate` | `dense_embedding` + `sparse_embedding` + `late_interaction_embedding` |
| `graph` | Same as ultimate + graph tables |

All modes store `text_content` for reranker (computed at query time).

## REST API

Start the server:

```bash
VECTRIXDB_API_KEY=your_secret vectrixdb serve --port 7337
```

Open the dashboard at `http://localhost:7337/dashboard`

### API Examples

```bash
# Create collection
curl -X POST http://localhost:7337/api/v1/collections \
  -H "Content-Type: application/json" \
  -H "api-key: your_secret" \
  -d '{"name": "docs", "dimension": 384}'

# Add documents (auto-embedding)
curl -X POST http://localhost:7337/api/v1/collections/docs/text-upsert \
  -H "Content-Type: application/json" \
  -H "api-key: your_secret" \
  -d '{"points": [{"id": "1", "text": "Hello world"}]}'

# Search
curl -X POST http://localhost:7337/api/v1/collections/docs/text-search \
  -H "Content-Type: application/json" \
  -H "api-key: your_secret" \
  -d '{"query_text": "greeting", "limit": 10}'
```

## Project Structure

```
VectrixDB/
├── vectrixdb/
│   ├── core/           # Vector index, storage, search
│   │   ├── graphrag/   # Knowledge graph
│   │   └── search/     # Search algorithms
│   ├── api/            # FastAPI server
│   ├── models/         # Embedded ONNX models
│   ├── dashboard/      # Web UI
│   └── cli.py          # Command line
├── tests/
└── requirements.txt
```

## Install from Source

```bash
git clone https://github.com/knowusuboaky/VectrixDB.git
cd VectrixDB
pip install -e .
```

## Requirements

- Python 3.9+
- No API keys needed
- Models are bundled or auto-downloaded

## License

Apache 2.0

## Author

**Kwadwo Daddy Nyame Owusu - Boakye**

GitHub: [@knowusuboaky](https://github.com/knowusuboaky)
