Metadata-Version: 2.4
Name: vectra-rag-py
Version: 1.0.0
Summary: A production-ready, provider-agnostic Python SDK for End-to-End RAG pipelines.
Author-email: Abhishek N <astroabhi.abhi@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/iamabhishek-n/vectra-py
Project-URL: Issues, https://github.com/iamabhishek-n/vectra-py/issues
Keywords: rag,llm,openai,anthropic,gemini,vector-database,vectrasdk
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pypdf
Requires-Dist: mammoth
Requires-Dist: openpyxl
Requires-Dist: openai
Requires-Dist: google-genai
Requires-Dist: anthropic
Requires-Dist: pydantic
Requires-Dist: prisma
Requires-Dist: chromadb
Requires-Dist: requests
Requires-Dist: asyncpg
Requires-Dist: aiohttp
Requires-Dist: pysbd
Dynamic: license-file

# Vectra (Python)

**Vectra** is a **production-grade, provider-agnostic Python SDK** for building **end-to-end Retrieval-Augmented Generation (RAG)** systems. It is designed for teams that need **correctness, extensibility, async performance, and observability** across embeddings, vector databases, retrieval strategies, and LLM providers.

![PyPI - Downloads](https://img.shields.io/pypi/dm/vectra-rag-py)
![GitHub Release](https://img.shields.io/github/v/release/iamabhishek-n/vectra-py)
[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=iamabhishek-n_vectra-py&metric=alert_status)](https://sonarcloud.io/summary/new_code?id=iamabhishek-n_vectra-py)

If you find this project useful, consider supporting it:<br>
[![Star this project on GitHub](https://img.shields.io/github/stars/iamabhishek-n/vectra-py?style=social)](https://github.com/iamabhishek-n/vectra-py/stargazers)
[![Sponsor me on GitHub](https://img.shields.io/badge/Sponsor%20me%20on-GitHub-%23FFD43B?logo=github)](https://github.com/sponsors/iamabhishek-n)
[![Buy me a Coffee](https://img.shields.io/badge/Buy%20me%20a%20Coffee-%23FFDD00?logo=buy-me-a-coffee&logoColor=black)](https://www.buymeacoffee.com/iamabhishekn)


## Table of Contents

* [1. Overview](#1-overview)
* [2. Design Goals & Philosophy](#2-design-goals--philosophy)
* [3. Feature Matrix](#3-feature-matrix)
* [4. Installation](#4-installation)
* [5. Quick Start](#5-quick-start)
* [6. Core Concepts](#6-core-concepts)
  * [Providers](#providers)
  * [Vector Stores](#vector-stores)
  * [Chunking](#chunking)
  * [Retrieval](#retrieval)
  * [Reranking](#reranking)
  * [Metadata Enrichment](#metadata-enrichment)
  * [Query Planning & Grounding](#query-planning--grounding)
  * [Conversation Memory](#conversation-memory)
* [7. Configuration Reference (Usage-Driven)](#7-configuration-reference-usage-driven)
* [8. Ingestion Pipeline](#8-ingestion-pipeline)
* [9. Querying & Streaming](#9-querying--streaming)
* [10. Conversation Memory](#10-conversation-memory)
* [11. Evaluation & Quality Measurement](#11-evaluation--quality-measurement)
* [12. CLI](#12-cli)
  * [Ingest & Query](#ingest--query)
  * [WebConfig (Config Generator UI)](#webconfig-config-generator-ui)
  * [Observability Dashboard](#observability-dashboard)
* [13. Observability & Callbacks](#13-observability--callbacks)
* [14. Telemetry](#14-telemetry)
* [15. Database Schemas & Indexing](#15-database-schemas--indexing)
* [16. Extending Vectra](#16-extending-vectra)
* [17. Architecture Overview](#17-architecture-overview)
* [18. Development & Contribution Guide](#18-development--contribution-guide)
* [19. Production Best Practices](#19-production-best-practices)

---

## 1. Overview

Vectra implements a **fully modular RAG pipeline**:

```
Load → Chunk → Embed → Store → Retrieve → Rerank → Plan → Ground → Generate → Stream
```
<p align="center">
  <img src="https://vectra.thenxtgenagents.com/vectraArch.png" alt="Vectra SDK Architecture" width="900">
</p>

<p align="center">
  <em>Vectra SDK – End-to-End RAG Architecture</em>
</p>

All stages are **explicitly configured**, **async-first**, and **observable**.

### Key Characteristics

* Async-first API (`asyncio`)
* Provider-agnostic embeddings & LLMs
* Multiple vector backends (Postgres, Chroma, Qdrant, Milvus)
* Advanced retrieval (HyDE, Multi-Query, Hybrid RRF, MMR)
* Unified streaming interface
* Built-in evaluation and observability
* CLI + SDK parity

---

## 2. Design Goals & Philosophy

### Explicitness over Magic

Vectra avoids hidden defaults. Chunking, retrieval, grounding, memory, and generation behavior are always explicit and validated.

### Production-First

Index helpers, rate limiting, embedding cache, observability, and evaluation are first-class features.

### Provider Neutrality

Switching providers (OpenAI ↔ Gemini ↔ Anthropic ↔ Ollama) requires **no application code changes**.

### Extensibility

All major subsystems are interface-driven and designed to be extended safely.

---

## 3. Feature Matrix

### Providers

* **Embeddings**: OpenAI, Gemini, Ollama, HuggingFace
* **Generation**: OpenAI, Gemini, Anthropic, Ollama, OpenRouter, HuggingFace
* **Streaming**: Async generators with normalized output

### Vector Stores

* PostgreSQL (Prisma + pgvector)
* ChromaDB
* Qdrant
* Milvus

### Retrieval Strategies

* Naive cosine similarity
* HyDE (Hypothetical Document Embeddings)
* Multi-Query expansion (RRF)
* Hybrid semantic + lexical (RRF)
* MMR diversification

---

## 4. Installation

### Library

```bash
pip install vectra-py
# or
uv pip install vectra-py
```

### Backends

```bash
# Prisma Client Python – https://prisma.brendonovich.dev
pip install prisma-client-py
# ChromaDB – https://docs.trychroma.com
pip install chromadb
# Qdrant Python Client – https://qdrant.tech/documentation
pip install qdrant-client
# Milvus Python SDK – https://milvus.io/docs
pip install pymilvus
```


### CLI

```bash
vectra --help
# alternative
python -m vectra.cli --help
```

### Requirements

Vectra depends on:
`pydantic`, `asyncio`, `prisma-client-py`, `chromadb`, `openai`, `google-generativeai`, `anthropic`, `pypdf`, `mammoth`, `openpyxl`

---

## 5. Quick Start

```python
import asyncpg
from vectra import VectraClient, VectraConfig, ProviderType

pool = await asyncpg.create_pool(os.getenv('DATABASE_URL'))

config = VectraConfig(
    embedding={
        'provider': ProviderType.OPENAI,
        'api_key': os.getenv('OPENAI_API_KEY'),
        'model_name': 'text-embedding-3-small'
    },
    llm={
        'provider': ProviderType.GEMINI,
        'api_key': os.getenv('GOOGLE_API_KEY'),
        'model_name': 'gemini-2.5-flash'
    },
    database={
        'type': 'postgres',
        'client_instance': pool,
        'table_name': 'document',
        'column_map': { 'content': 'content', 'metadata': 'metadata', 'vector': 'vector' }
    }
)

client = VectraClient(config)
await client.ingest_documents('./docs')
result = await client.query_rag('What is the vacation policy?')
print(result['answer'])
```

---

## 6. Core Concepts

### Providers

Providers implement embeddings, generation, or both. Vectra normalizes responses and streaming across providers.

### Vector Stores

Vector stores persist embeddings and metadata. Backends are swappable via configuration.

### Chunking

* **Recursive**: Token-aware, separator-aware splitting
* **Agentic**: LLM-driven semantic propositions

### Retrieval

Configurable strategies to balance recall, precision, and latency.

### Reranking

Optional LLM-based reordering of candidate chunks.

### Metadata Enrichment

Optional per-chunk summaries, keywords, and hypothetical questions generated during ingestion.

### Query Planning & Grounding

Controls context assembly and factual grounding constraints.

### Conversation Memory

Persist multi-turn chat history across sessions.

---

## 7. Configuration Reference (Usage-Driven)

> All configuration is validated using **Pydantic** at runtime.

### Embedding

```python
embedding={
  'provider': ProviderType.OPENAI,
  'api_key': os.getenv('OPENAI_API_KEY'),
  'model_name': 'text-embedding-3-small',
  'dimensions': 1536
}
```

Use `dimensions` when using pgvector to avoid runtime mismatches.

---

### LLM

```python
llm={
  'provider': ProviderType.GEMINI,
  'api_key': os.getenv('GOOGLE_API_KEY'),
  'model_name': 'gemini-2.5-flash',
  'temperature': 0.3,
  'max_tokens': 1024
}
```

Used for generation

---

### Database

Supports Prisma, Chroma, Qdrant, Milvus.

```python
# PostgreSQL (native asyncpg)
database={
  'type': 'postgres',
  'client_instance': pg_pool,  # asyncpg.Pool or Connection
  'table_name': 'document',
  'column_map': { 'content': 'content', 'metadata': 'metadata', 'vector': 'vector' }
}
```

```python
# Prisma (Postgres via prisma-client-py)
database={
  'type': 'prisma',
  'client_instance': prisma,
  'table_name': 'Document',
  'column_map': { 'content': 'content', 'metadata': 'metadata', 'vector': 'embedding' }
}
```

```python
# ChromaDB
database={
  'type': 'chroma',
  'client_instance': chroma_client,  # chromadb.Client or PersistentClient
  'table_name': 'rag_collection',
  'column_map': { 'content': 'content', 'metadata': 'metadata', 'vector': 'embedding' }
}
```

```python
# Qdrant
database={
  'type': 'qdrant',
  'client_instance': qdrant_client,  # qdrant_client.QdrantClient
  'table_name': 'rag_collection',
  'column_map': { 'content': 'content', 'metadata': 'metadata', 'vector': 'embedding' }
}
```

```python
# Milvus
database={
  'type': 'milvus',
  'client_instance': milvus_client,  # pymilvus client
  'table_name': 'rag_collection',
  'column_map': { 'content': 'content', 'metadata': 'metadata', 'vector': 'embedding' }
}
```

---

### Chunking

```python
chunking={
  'strategy': ChunkingStrategy.RECURSIVE,
  'chunk_size': 1000,
  'chunk_overlap': 200
}
```

Agentic:

```python
chunking={
  'strategy': ChunkingStrategy.AGENTIC,
  'agentic_llm': {
    'provider': ProviderType.OPENAI,
    'api_key': os.getenv('OPENAI_API_KEY'),
    'model_name': 'gpt-4o-mini'
  }
}
```

---

### Retrieval

```python
retrieval={ 'strategy': RetrievalStrategy.HYBRID }
```

Hybrid is recommended for production workloads.

---

### Reranking

```python
reranking={
  'enabled': True,
  'window_size': 20,
  'top_n': 5
}
```

---

### Memory

```python
memory={ 'enabled': True, 'type': 'in-memory', 'max_messages': 20 }
```

Redis and Postgres are supported.

```python
# Redis
memory={
  'enabled': True,
  'type': 'redis',
  'max_messages': 20,
  'redis': {
    'client_instance': redis_client,
    'key_prefix': 'vectra:chat:'
  }
}
```

```python
# Postgres
memory={
  'enabled': True,
  'type': 'postgres',
  'max_messages': 20,
  'postgres': {
    'client_instance': pg_pool,  # asyncpg.Pool or Connection
    'table_name': 'ChatMessage',
    'column_map': {
      'sessionId': 'sessionId',
      'role': 'role',
      'content': 'content',
      'createdAt': 'createdAt'
    }
  }
}
```

---

### Observability

```python
observability={
  'enabled': True,
  'sqlite_path': 'vectra-observability.db'
}
```

---

## 8. Ingestion Pipeline

```python
await client.ingest_documents('./documents')
```

* Files or directories supported
* Recursive traversal
* Embedding cache via SHA256
* Optional rate limiting

Supported formats: PDF, DOCX, XLSX, TXT, Markdown

---

## 9. Querying & Streaming

Standard:

```python
res = await client.query_rag('Refund policy?')
```

Streaming:

```python
stream = await client.query_rag('Draft email', stream=True)
async for chunk in stream:
    print(chunk.get('delta', ''), end='')
```

---

## 10. Conversation Memory

Pass a `session_id` to preserve multi-turn context.

---

## 11. Evaluation & Quality Measurement

```python
await client.evaluate([
  { 'question': 'Capital of France?', 'expected_ground_truth': 'Paris' }
])
```

Metrics: Faithfulness, Relevance

---

## 12. CLI

### Ingest & Query

```bash
vectra ingest ./docs --config=./config.json
vectra query "What are the payment terms?" --config=./config.json --stream
```

---

### WebConfig (Config Generator UI)

```bash
vectra webconfig
```

Launches a local web UI to interactively generate and validate `vectra.config.json`.

---

### Observability Dashboard

```bash
vectra dashboard
```

Launches a local dashboard for metrics, traces, and session analysis.

---

## 13. Observability & Callbacks

Tracks metrics, traces, and chat sessions when enabled.

Callbacks allow hooking into ingestion, retrieval, reranking, and generation stages.
 
 ---
 
 ## 14. Telemetry
 
 Vectra collects anonymous usage data to help us improve the SDK, prioritize features, and detect broken versions.
 
 ### What we track
 * **Identity**: A random UUID (`distinct_id`) stored locally in `~/.vectra/telemetry.json`. **No PII, emails, IPs, or hostnames.**
 * **Events**:
     * `sdk_initialized`: Config shape (providers used), OS/Runtime version, session type (api/cli/chat).
     * `ingest_started/completed`: Source type, chunking strategy, duration bucket, chunk count bucket.
     * `query_executed`: Retrieval strategy, query mode (rag), result count, latency bucket.
     * `feature_used`: WebConfig/Dashboard usage.
     * `evaluation_run`: Dataset size bucket.
     * `error_occurred`: Error type and stage (no stack traces).
     * `cli_command_used`: Command name and flags.
 
 ### Why we track it
 * **Detect broken versions**: Spikes in `error_occurred` help us find bugs.
 * **Measure adoption**: Helps us understand which providers (OpenAI vs Gemini) and vector stores are most popular.
 * **Drop support safely**: We can see if anyone is still using Python 3.8 before dropping it.
 
 ### How to opt-out
 Telemetry is **enabled by default**. To disable it:
 
 **Option 1: Config**
 ```python
 client = VectraClient(
     VectraConfig(
         # ...
         telemetry={'enabled': False}
     )
 )
 ```
 
 **Option 2: Environment Variable**
 Set `VECTRA_TELEMETRY_DISABLED=1` or `DO_NOT_TRACK=1`.
 
 ---
 
 ## 15. Database Schemas & Indexing

```prisma
model Document {
  id        String   @id @default(uuid())
  content   String
  metadata  Json
  embedding Unsupported("vector")?
  createdAt DateTime @default(now())
}
```

---

## 16. Extending Vectra

Implement custom vector stores by extending `VectorStore`.

---

## 17. Architecture Overview

Vectra follows a modular, provider-agnostic RAG architecture with clear separation of ingestion, retrieval, and generation pipelines.

---

## 18. Development & Contribution Guide

* Python 3.8+
* Async-first (`asyncio`)
* Pydantic-based configuration

---

## 19. Production Best Practices

* Match embedding dimensions to pgvector
* Prefer Hybrid retrieval
* Enable observability in staging
* Evaluate before changing chunk sizes

---

**Vectra (Python) scales cleanly from local prototypes to production-grade RAG platforms.**
