Welcome to VectrixDB
Where vectors come alive
Collections
Browse and manage vector collections
Console
Execute API requests directly
Search
Semantic vector search
Collections
Console
// Response will appear here
Collection
| ID | Metadata | Vector | |
|---|---|---|---|
| Select a collection to browse points | |||
Tutorials
Get Started with Demo Data
Load demo data to explore all VectrixDB features: semantic search, keyword search, reranking, and knowledge graphs.
Learn the basics: create a collection, add data, and perform your first search.
- Click Load Demo above
- Go to Collections โ click
demo - Browse the points to see the data
- Go to Search and try all search modes
Find results by meaning, not just keywords. Ask questions in natural language.
Traditional search that matches exact words. Great for names, codes, or specific terms.
Dense search with cross-encoder re-ranking for improved accuracy.
Token-level matching for maximum accuracy. Best for complex queries.
Automatically extracts entities and relationships, enabling graph-based retrieval.
- Auto entity extraction (People, Places, Concepts)
- Relationship mapping between entities
- Graph traversal for connected results
POST /api/collections
{
"name": "my_docs",
"dimension": 384,
"metric": "cosine"
}
POST /api/collections/{name}/points
{
"points": [{
"id": "1",
"text": "Your document text",
"payload": {"source": "file.pdf"}
}]
}
POST /api/collections/{name}/text-search
{
"text": "your query",
"mode": "hybrid",
"limit": 10
}
POST /api/collections/{name}/text-search
{
"text": "query",
"filter": {
"category": "science"
}
}
Code Tips
Copy-paste Python examples for VectrixDB. Click any code block to copy.
The simplest way to use VectrixDB - one line to add, one line to search.
from vectrixdb import Vectrix # Create database and add data db = Vectrix("my_docs", tier="hybrid") db.add(["Python is great", "JavaScript is fun", "Rust is fast"]) # Search results = db.search("programming language", mode="hybrid") print(results.top.text) # Best match
With metadata and filters:
from vectrixdb import Vectrix db = Vectrix("products", tier="hybrid") db.add( texts=["iPhone 15 Pro", "Samsung Galaxy", "Pixel 8"], metadata=[ {"brand": "Apple", "price": 999}, {"brand": "Samsung", "price": 899}, {"brand": "Google", "price": 699} ] ) # Filter by metadata results = db.search("smartphone", filter={"brand": "Apple"}) print(results.top.text) # "iPhone 15 Pro"
dense - Vector similarity onlysparse - BM25 keywords onlyhybrid - Dense + Sparse + Rerankultimate - Hybrid + ColBERTExamples for each tier:
from vectrixdb import Vectrix # Simple semantic similarity db = Vectrix("docs", tier="dense") db.add(["The quick brown fox", "A fast auburn canine"]) results = db.search("swift fox", mode="dense")
from vectrixdb import Vectrix # Keywords + Meaning + Reranking db = Vectrix("scifact", tier="hybrid") db.add(["CRISPR enables precise gene editing", "mRNA vaccines trigger immune response"]) # You can use different modes on hybrid tier results_hybrid = db.search("gene therapy", mode="hybrid") results_dense = db.search("immune system", mode="dense") results_sparse = db.search("CRISPR", mode="sparse")
from vectrixdb import Vectrix # Hybrid + ColBERT late interaction db = Vectrix("scifact", tier="ultimate") db.add([ "Sleep deprivation impairs memory consolidation", "Exercise reduces Alzheimer's disease risk", "The gut microbiome affects mental health" ]) # Best for complex queries results = db.search( "How does sleep affect brain function?", mode="ultimate" )
from vectrixdb import Vectrix # Auto-extracts entities and relationships db = Vectrix("biomedical", tier="graph") db.add([ "Metformin treats type 2 diabetes.", "Metformin may have anticancer properties." ]) # Internally extracts: Metformin --[treats]--> type 2 diabetes results = db.search("What does metformin treat?")
import pandas as pd from vectrixdb import Vectrix df = pd.read_csv("data.csv") db = Vectrix("docs", tier="hybrid") db.add( texts=df["text"].tolist(), ids=df["id"].tolist(), metadata=df[["category", "author"]].to_dict("records") )
from vectrixdb import Vectrix from pathlib import Path db = Vectrix("docs", tier="hybrid") for f in Path("./docs").glob("**/*.md"): content = f.read_text(encoding="utf-8") db.add( texts=[content], ids=[str(f)], metadata=[{"filename": f.name}] )
from vectrixdb import Vectrix def chunk_text(text, chunk_size=500, overlap=50): """Split text into overlapping chunks""" words = text.split() chunks = [] for i in range(0, len(words), chunk_size - overlap): chunk = ' '.join(words[i:i + chunk_size]) if chunk: chunks.append(chunk) return chunks db = Vectrix("docs", tier="hybrid") # Chunk and add long_doc = "Very long document text..." * 1000 chunks = chunk_text(long_doc) db.add( texts=chunks, ids=[f"chunk_{i}" for i in range(len(chunks))], metadata=[{"chunk_idx": i} for i in range(len(chunks))] )
Index documents with page/section hierarchy. Separate from vector collections - uses the same storage backend.
from vectrixdb import VectrixDB # Open database (uses same storage as collections) db = VectrixDB("./my_data") # Index a text document doc_info = db.documents.index_text( text="""# Introduction This is the first section. ## Background More details here. ## Methods The approach used.""", title="My Research Paper" ) print(f"Indexed: {doc_info.doc_id} with {doc_info.node_count} nodes")
# List all documents docs = db.documents.list_documents() for doc in docs: print(f"{doc.title}: {doc.section_count} sections") # Get document tree nodes nodes = db.documents.get_document_nodes(doc_id) for node in nodes: indent = " " * node.level print(f"{indent}{node.title}") # Get specific section section = db.documents.get_section(doc_id, section_title="Methods") print(section.text)
# Get smart chunks with heading context chunks = db.documents.get_chunks(doc_id, chunk_size=512) for chunk in chunks: print(f"[{chunk.heading}] {chunk.text[:100]}...") # Use chunks with Vectrix for semantic search from vectrixdb import Vectrix vx = Vectrix("doc_search", tier="hybrid") vx.add( texts=[c.text for c in chunks], ids=[c.chunk_id for c in chunks], metadata=[{"doc_id": c.doc_id, "heading": c.heading} for c in chunks] )
from vectrixdb import VectrixDB # Connect to Databricks Lakebase db = VectrixDB.with_lakebase( host="your-workspace.cloud.databricks.com", database="vectrixdb", token="dapi..." # Databricks PAT ) # Documents use the same Lakebase storage db.documents.index_text("Your document content...", title="Cloud Doc")
Use Delta Lake (Unity Catalog) for governed storage, sync to Lakebase for fast vector search.
from vectrixdb import VectrixDB # Connect to Delta Lake via Unity Catalog db = VectrixDB.with_delta_lake( workspace_url="https://adb-123.azuredatabricks.net", token="dapi...", # Databricks PAT catalog="main", # Unity Catalog catalog schema="vectrixdb" # Schema for VectrixDB tables ) # Create collection (stored as Delta table) coll = db.create_collection("documents", dimension=384) coll.add(ids=["doc1"], vectors=[[0.1, 0.2, ...]])
from vectrixdb import VectrixDB, VectrixSync # Source: Delta Lake (governed, ACID, slow search) delta = VectrixDB.with_delta_lake( workspace_url="https://adb-123.azuredatabricks.net", token="dapi...", catalog="main", schema="vectrixdb" ) # Target: Lakebase (fast vector search) lakebase = VectrixDB.with_lakebase( host="your-workspace.cloud.databricks.com", token="dapi...", database="vectrixdb" ) # Create sync + auto-sync (one-liner!) sync = VectrixSync(source=delta, target=lakebase) sync.auto() # Full sync + starts scheduler # Now just write to Delta Lake - auto-synced!
# One-liner: full sync if needed + start scheduler sync.auto(interval_minutes=5) # That's it! Now just write to Delta Lake delta.get_collection("docs").add(ids=["new1"], vectors=[[...]]) # Auto-synced every 5 minutes! # Check status anytime status = sync.status() print(f"Last sync: {status.last_sync}, Lag: {status.lag_seconds:.0f}s")
Use language="en" for bundled English models (faster, offline), or custom models.
from vectrixdb import Vectrix # English models - bundled, no download (~100MB total) db = Vectrix("docs", tier="hybrid", language="en") # Multilingual - auto-downloads on first use db = Vectrix("docs", tier="hybrid") # or language="multi"
# pip install sentence-transformers from vectrixdb import Vectrix # Standard models db = Vectrix("docs", model="sentence-transformers/all-MiniLM-L6-v2") db = Vectrix("docs", model="sentence-transformers/all-mpnet-base-v2") # BGE models (high quality) db = Vectrix("docs", model="BAAI/bge-small-en-v1.5") db = Vectrix("docs", model="BAAI/bge-large-en-v1.5") # Multilingual db = Vectrix("docs", model="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")
# pip install openai from vectrixdb import Vectrix from openai import OpenAI client = OpenAI() # Uses OPENAI_API_KEY env var def openai_embed(texts): response = client.embeddings.create( model="text-embedding-3-small", input=texts ) return [r.embedding for r in response.data] db = Vectrix( "docs", tier="hybrid", embed_fn=openai_embed, dimension=1536 ) db.add(["Document 1", "Document 2"]) results = db.search("query")
# pip install cohere from vectrixdb import Vectrix import cohere co = cohere.Client("YOUR_API_KEY") def cohere_embed(texts): response = co.embed(texts=texts, model="embed-english-v3.0") return response.embeddings db = Vectrix( "docs", tier="hybrid", embed_fn=cohere_embed, dimension=1024 )
Use embedders directly for custom pipelines.
from vectrixdb.models import ( DenseEmbedder, # Dense vectors (384 dim) SparseEmbedder, # BM25 sparse vectors RerankerEmbedder, # Cross-encoder reranking LateInteractionEmbedder, # ColBERT/BGE-M3 (128/1024 dim) GraphExtractor, # Knowledge triplet extraction )
from vectrixdb.models import DenseEmbedder # English (bundled, faster) embedder = DenseEmbedder(language="en") # Multilingual (auto-download) embedder = DenseEmbedder() # Generate embeddings vectors = embedder.embed(["Hello world", "How are you"]) print(vectors.shape) # (2, 384)
from vectrixdb.models import RerankerEmbedder reranker = RerankerEmbedder(language="en") scores = reranker.score("What is AI?", [ "AI is artificial intelligence", "The weather is sunny", "Machine learning is AI", ]) print(scores) # [0.99, 0.01, 0.87] # Rerank and get sorted results ranked = reranker.rerank("query", docs, top_k=5)
from vectrixdb.models import LateInteractionEmbedder # English ColBERT (bundled, 128 dim) late = LateInteractionEmbedder(language="en") # Multilingual BGE-M3 (auto-download, 1024 dim) late = LateInteractionEmbedder() # Encode query and document query_emb = late.encode_query("What is machine learning?") doc_emb = late.encode_document("Machine learning is a subset of AI...") # MaxSim scoring score = late.max_sim(query_emb, doc_emb) print(f"MaxSim: {score:.4f}")
from vectrixdb.models import GraphExtractor extractor = GraphExtractor() triplets = extractor.extract("CRISPR-Cas9 can edit human DNA.") for t in triplets: print(f"{t.head} --[{t.relation}]--> {t.tail}") # Output: CRISPR-Cas9 --[can edit]--> human DNA
Build RAG agents with VectrixDB as the retriever.
from vectrixdb import Vectrix from langchain_core.retrievers import BaseRetriever from langchain_core.documents import Document from pydantic import Field from typing import List, Optional class VectrixRetriever(BaseRetriever): """LangChain retriever for VectrixDB.""" db: Vectrix = Field(description="VectrixDB instance") k: int = Field(default=4) mode: str = Field(default="hybrid") class Config: arbitrary_types_allowed = True def _get_relevant_documents(self, query: str, **kwargs) -> List[Document]: results = self.db.search(query, mode=self.mode, limit=self.k) return [ Document( page_content=r.text, metadata={"id": r.id, "score": r.score, **r.metadata} ) for r in results ]
from vectrixdb import Vectrix from langchain_openai import ChatOpenAI from langchain.chains import create_retrieval_chain from langchain.chains.combine_documents import create_stuff_documents_chain from langchain_core.prompts import ChatPromptTemplate # Create VectrixDB + retriever db = Vectrix("docs", tier="hybrid") db.add(["Your documents..."]) retriever = VectrixRetriever(db=db, k=4, mode="hybrid") # Create RAG chain llm = ChatOpenAI(model="gpt-4o-mini") prompt = ChatPromptTemplate.from_messages([ ("system", "Answer based on context:\n{context}"), ("human", "{input}") ]) doc_chain = create_stuff_documents_chain(llm, prompt) rag_chain = create_retrieval_chain(retriever, doc_chain) # Use it result = rag_chain.invoke({"input": "What is...?"}) print(result["answer"])
- Requires OpenAI embeddings
- Single search mode
- No built-in reranking
- Bundled embeddings (free!)
- Dense, sparse, hybrid, ultimate
- Built-in cross-encoder reranking
# Install pip install vectrixdb # Set API key (enables write operations) # Linux/Mac: export VECTRIXDB_API_KEY="your-secret-key" # Windows: set VECTRIXDB_API_KEY=your-secret-key # Optional: Read-only API key (view but not modify) export VECTRIXDB_READ_ONLY_API_KEY="viewer-key" # Download models (optional - English models bundled) vectrixdb download-models # All multilingual vectrixdb download-models --type dense # Dense only vectrixdb download-models --type late_interaction # BGE-M3 # Start server with dashboard (CLI) vectrixdb serve --port 7337 # Start server without dashboard (CLI) vectrixdb serve --port 7337 --no-dashboard # Start server programmatically (Python) from vectrixdb.api.server import run_server run_server(host="0.0.0.0", port=7337, db_path="./vectrixdb_data") # Or with uvicorn directly import uvicorn from vectrixdb.api.server import create_app app = create_app(db_path="./vectrixdb_data") uvicorn.run(app, host="0.0.0.0", port=7337) # Database info vectrixdb info ./vectrixdb_data vectrixdb list ./vectrixdb_data # Check models vectrixdb models-info
Documents
Hierarchical document index for page/section navigation