Metadata-Version: 2.4
Name: langchain-velesdb
Version: 1.14.2
Summary: LangChain VectorStore for VelesDB: The Local AI Memory Database. Microsecond RAG retrieval.
Author-email: VelesDB Team <contact@wiscale.fr>
License: MIT
Project-URL: Homepage, https://github.com/cyberlife-coder/VelesDB
Project-URL: Documentation, https://velesdb.com/docs/integrations/langchain
Project-URL: Repository, https://github.com/cyberlife-coder/VelesDB
Keywords: langchain,velesdb,vector-database,embeddings,rag,local-first,ai-memory,semantic-search,llm,chatgpt
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: langchain-core<1.0.0,>=0.1.0
Requires-Dist: velesdb>=1.12.0
Requires-Dist: velesdb-common>=1.12.0
Provides-Extra: dev
Requires-Dist: pytest<9.0,>=7.0; extra == "dev"
Requires-Dist: pytest-asyncio<1.0.0,>=0.21; extra == "dev"
Requires-Dist: langchain<1.0.0,>=0.1.0; extra == "dev"
Dynamic: license-file

# langchain-velesdb

LangChain integration for [VelesDB](https://github.com/cyberlife-coder/VelesDB) vector database.

## Installation

```bash
pip install langchain-velesdb
```

## Quick Start

```python
from langchain_velesdb import VelesDBVectorStore
from langchain_openai import OpenAIEmbeddings

# Initialize vector store
vectorstore = VelesDBVectorStore(
    path="./my_vectors",
    collection_name="documents",
    embedding=OpenAIEmbeddings()
)

# Add documents
vectorstore.add_texts([
    "VelesDB is a high-performance vector database",
    "Built entirely in Rust for speed and safety",
    "Perfect for RAG applications and semantic search"
])

# Search
results = vectorstore.similarity_search("fast database", k=2)
for doc in results:
    print(doc.page_content)
```

## Usage with RAG

```python
from langchain_velesdb import VelesDBVectorStore
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.chains import RetrievalQA

# Create vector store with documents
vectorstore = VelesDBVectorStore.from_texts(
    texts=["Document 1 content", "Document 2 content"],
    embedding=OpenAIEmbeddings(),
    path="./rag_data",
    collection_name="knowledge_base"
)

# Create RAG chain
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
qa_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(),
    chain_type="stuff",
    retriever=retriever
)

# Ask questions
answer = qa_chain.run("What is VelesDB?")
print(answer)
```

## API Reference

### VelesDBVectorStore

```python
VelesDBVectorStore(
    embedding: Embeddings,
    path: str = "./velesdb_data",
    collection_name: str = "langchain",
    metric: str = "cosine",         # "cosine", "euclidean", "dot" (aliases: "dotproduct", "inner", "ip"), "hamming", "jaccard"
    storage_mode: str = "full",     # "full"/"f32", "sq8"/"int8" (4× compression), "binary"/"bit" (32× compression), "pq" (8-32× compression), "rabitq" (32× with scalar correction)
    search_quality: str = None,     # "fast", "balanced", "accurate", "perfect", "autotune", "custom:N", "adaptive:MIN:MAX"
)
```

#### Methods

**Core Operations:**
- `add_texts(texts, metadatas=None, ids=None)` - Add texts to the store
- `add_texts_bulk(texts, metadatas=None, ids=None)` - Bulk insert (2-3x faster for large batches)
- `delete(ids)` - Delete documents by ID
- `get_by_ids(ids)` - Retrieve documents by their IDs
- `flush()` - Flush pending changes to disk

**Search:**
- `similarity_search(query, k=4)` - Search for similar documents
- `similarity_search_with_score(query, k=4)` - Search with similarity scores
- `similarity_search_with_filter(query, k=4, filter=None)` - Search with metadata filtering
- `batch_search(queries, k=4)` - Batch search multiple queries in parallel
- `batch_search_with_score(queries, k=4)` - Batch search with scores
- `multi_query_search(queries, k=4, fusion="rrf", ...)` - **Multi-query fusion search**
- `multi_query_search_with_score(queries, k=4, ...)` - Multi-query search with fused scores
- `hybrid_search(query, k=4, vector_weight=0.5, filter=None)` - Hybrid vector+BM25 search
- `text_search(query, k=4, filter=None)` - Full-text BM25 search
- `query(query_str, params=None)` - Execute VelesQL query

**Utilities:**
- `as_retriever(**kwargs)` - Convert to LangChain retriever
- `from_texts(texts, embedding, ...)` - Create store from texts (class method)
- `get_collection_info()` - Get collection metadata (name, dimension, point_count)
- `is_empty()` - Check if collection is empty
- `scroll(batch_size=100, filter=None)` - Iterate over all points in stable batches without a query vector

## Advanced Features

### Multi-Query Fusion (MQG)
Search with multiple query reformulations and fuse results using various strategies.
Perfect for RAG pipelines using Multiple Query Generation (MQG).

```python
# Basic usage with RRF (Reciprocal Rank Fusion)
results = vectorstore.multi_query_search(
    queries=["travel to Greece", "Greek vacation", "Athens trip"],
    k=10,
)

# With weighted fusion (like SearchXP's scoring)
results = vectorstore.multi_query_search(
    queries=["travel Greece", "vacation Mediterranean"],
    k=10,
    fusion="weighted",
    fusion_params={
        "avg_weight": 0.6,   # Average score weight
        "max_weight": 0.3,   # Maximum score weight  
        "hit_weight": 0.1,   # Hit ratio weight
    }
)

# Get fused scores
results_with_scores = vectorstore.multi_query_search_with_score(
    queries=["query1", "query2", "query3"],
    k=5,
    fusion="rrf",
    fusion_params={"k": 60}  # RRF parameter
)
for doc, score in results_with_scores:
    print(f"{score:.3f}: {doc.page_content}")
```

**Fusion Strategies:**
- `"rrf"` - Reciprocal Rank Fusion (default, robust to score scale differences)
- `"average"` - Mean score across all queries
- `"maximum"` - Maximum score from any query
- `"weighted"` - Custom combination of avg, max, and hit ratio
- `"relative_score"` - Linear blend of dense and sparse scores

```python
# Relative Score Fusion — explicit control over dense vs sparse weight
results = vectorstore.multi_query_search(
    queries=["semantic search", "keyword retrieval"],
    k=10,
    fusion="relative_score",
    fusion_params={"dense_weight": 0.7, "sparse_weight": 0.3}
)
```

### Advanced Search

#### `search_quality` — Quality Presets

Control the recall/latency trade-off for all similarity searches with a single
parameter set at construction time or overridden per-call.

```python
# Set once on the store — applies to every similarity_search call
vectorstore = VelesDBVectorStore(
    embedding=OpenAIEmbeddings(),
    path="./data",
    search_quality="accurate",   # higher recall at the cost of latency
)

results = vectorstore.similarity_search("machine learning", k=10)

# Override per-call via kwargs
results = vectorstore.similarity_search_with_score(
    "machine learning", k=10, search_quality="fast",
)
```

Accepted values:

| Value | Description |
|-------|-------------|
| `"fast"` | Lowest latency, reduced recall |
| `"balanced"` | Balanced latency/recall |
| `"accurate"` | Higher recall, higher latency |
| `"perfect"` | Exhaustive search, maximum recall |
| `"autotune"` | Runtime-adaptive quality |
| `"custom:N"` | Explicit ef_search (e.g. `"custom:256"`) |
| `"adaptive:MIN:MAX"` | Adaptive ef range (e.g. `"adaptive:32:512"`) |

#### `similarity_search_with_ef(query, ef_search, k)`

Search with an explicit HNSW `ef_search` parameter to trade query latency for recall.
Higher `ef_search` increases recall at the cost of slower search.

```python
# Use a high ef_search for maximum recall at query time
results = vectorstore.similarity_search_with_ef(
    query="machine learning",
    ef_search=256,
    k=10
)
```

### Server Mode: URL Validation

When connecting to a remote `velesdb-server` via the `url` parameter,
`validate_url` is called automatically during initialization to reject
malformed URLs before any network request is issued.

### Hybrid Search (Vector + BM25)

```python
# Combine vector similarity with keyword matching
results = vectorstore.hybrid_search(
    query="machine learning performance",
    k=5,
    vector_weight=0.7  # 70% vector, 30% BM25
)
for doc, score in results:
    print(f"{score:.3f}: {doc.page_content}")
```

### Full-Text Search (BM25)

```python
# Pure keyword-based search
results = vectorstore.text_search("VelesDB Rust", k=5)
for doc, score in results:
    print(f"{score:.3f}: {doc.page_content}")
```

### Metadata Filtering

```python
# Search with filters
results = vectorstore.similarity_search_with_filter(
    query="database",
    k=5,
    filter={"condition": {"type": "eq", "field": "category", "value": "tech"}}
)
```

### Cross-Collection MATCH

Use the `query()` method with the `_collection` parameter to run MATCH queries
that enrich results with data from other collections. Nodes annotated with
`@collection` in the MATCH pattern have their payloads looked up from the named
collection after traversal.

```python
# Enrich Product nodes with Inventory data from the 'inventory' collection
results = vectorstore.query(
    "MATCH (p:Product)-[:STORED_IN]->(inv:Inventory@inventory) "
    "RETURN p.name, inv.price, inv.stock LIMIT 20",
    params={"_collection": "catalog_graph"}
)
for row in results:
    print(row["p.name"], row["inv.price"])
```

## Features

- **High Performance**: VelesDB's Rust backend delivers sub-millisecond latencies
- **SIMD Optimized**: Hardware-accelerated vector operations  
- **Multi-Query Fusion**: Native support for MQG pipelines with RRF/Weighted fusion
- **Hybrid Search**: Combine vector similarity with BM25 text matching
- **Full-Text Search**: BM25 ranking for keyword queries
- **Metadata Filtering**: Filter results by document attributes
- **Simple Setup**: Self-contained single binary, no external services required
- **Full LangChain Compatibility**: Works with all LangChain chains and agents

## License

MIT License (this integration). See [LICENSE](https://github.com/cyberlife-coder/VelesDB/blob/main/integrations/langchain/LICENSE) for details.

VelesDB Core itself is licensed under the [VelesDB Core License 1.0](https://github.com/cyberlife-coder/VelesDB/blob/main/LICENSE) (based on ELv2).
