Metadata-Version: 2.4
Name: vecforge
Version: 1.0.0
Summary: Forge your vector database. Own it forever. Local-first, encrypted, quantum-inspired.
Author-email: Suneel Bose K <suneelbose@arcgx.in>
License: Business Source License 1.1
Project-URL: Homepage, https://bosekarmegam.github.io/vecforge/
Project-URL: Repository, https://github.com/bosekarmegam/vecforge
Project-URL: Issues, https://github.com/bosekarmegam/vecforge/issues
Keywords: vector-database,faiss,embeddings,search,local-first,encrypted
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE
Requires-Dist: faiss-cpu>=1.7.4
Requires-Dist: sentence-transformers>=2.2.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: rank-bm25>=0.2.2
Requires-Dist: fastapi>=0.100.0
Requires-Dist: uvicorn[standard]>=0.23.0
Requires-Dist: pymupdf>=1.23.0
Requires-Dist: numba>=0.58.0
Requires-Dist: joblib>=1.3.0
Requires-Dist: click>=8.1.0
Requires-Dist: python-docx>=1.0.0
Requires-Dist: beautifulsoup4>=4.12.0
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: mypy>=1.5.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: black>=23.7.0; extra == "dev"
Requires-Dist: types-beautifulsoup4; extra == "dev"
Provides-Extra: gpu
Requires-Dist: faiss-gpu>=1.7.4; extra == "gpu"
Requires-Dist: cupy>=12.0.0; extra == "gpu"
Provides-Extra: quantum
Dynamic: license-file

<p align="center">
  <img src="vecforge-logo.svg" alt="VecForge Logo" width="120" height="120">
  <h1 align="center">VecForge</h1>
  <p align="center"><strong>Forge your vector database. Own it forever.</strong></p>
  <p align="center">
    Local-first · Encrypted · Hybrid Search · Zero Cloud Dependency
  </p>
</p>

---

**VecForge** is a universal, local-first Python vector database with enterprise security, multimodal ingestion, and optional quantum-inspired acceleration.

Built by **Suneel Bose K** — Founder & CEO, [ArcGX TechLabs Private Limited](https://arcgx.in)

[![PyPI version](https://img.shields.io/pypi/v/vecforge.svg)](https://pypi.org/project/vecforge/)
[![License: BSL 1.1](https://img.shields.io/badge/License-BSL%201.1-blue.svg)](LICENSE)
[![Python 3.10+](https://img.shields.io/badge/Python-3.10+-green.svg)](https://python.org)
[![Tests](https://github.com/bosekarmegam/vecforge/actions/workflows/tests.yml/badge.svg)](https://github.com/bosekarmegam/vecforge/actions/workflows/tests.yml)
[![Coverage](https://img.shields.io/badge/Coverage-89%25-brightgreen.svg)](#-benchmarks)
[![Ruff](https://img.shields.io/badge/Linting-Ruff%20✅-brightgreen.svg)](https://github.com/astral-sh/ruff)
[![Mypy](https://img.shields.io/badge/Typing-Mypy%20✅-brightgreen.svg)](https://mypy-lang.org/)
[![Benchmark](https://img.shields.io/badge/100k%20Search-11.31ms%20✅-brightgreen.svg)](#-benchmarks)

---

## ⚡ 5-Line Quickstart

```python
from vecforge import VecForge

db = VecForge("my_vault")
db.add("Patient admitted with type 2 diabetes", metadata={"ward": "7"})
results = db.search("diabetic patient")
print(results[0].text)
```

That's it. No API keys. No cloud. No config files. **Your data stays on your machine.**

---

## 🔥 Why VecForge?

| Feature | Pinecone | ChromaDB | **VecForge** |
|---|---|---|---|
| Local-first | ❌ Cloud-only | ✅ | ✅ **Always** |
| Encryption at rest | ❌ | ❌ | ✅ **AES-256** |
| Hybrid search | ✅ | ❌ | ✅ **Dense + BM25** |
| Namespace isolation | ✅ Cloud | ❌ | ✅ **Local** |
| RBAC | ✅ Cloud | ❌ | ✅ **Built-in** |
| Audit logging | ❌ | ❌ | ✅ **JSONL** |
| Price | $$$$ | Free | ✅ **Free** |

---

## 📦 Install

```bash
pip install vecforge
```

### From source (development)

```bash
git clone https://github.com/bosekarmegam/vecforge.git
cd vecforge
pip install -e ".[dev]"
```

### System Requirements

> **Windows users:** VecForge uses PyTorch under the hood, which requires the
> [Microsoft Visual C++ Redistributable](https://aka.ms/vs/17/release/vc_redist.x64.exe).
> Install it before running VecForge.

> 📖 See the full [Installation Guide](docs/installation.md) for GPU, encryption, and platform-specific options.

---

## 🔐 Encrypted Vault

```python
import os
from vecforge import VecForge

db = VecForge(
    "secure_vault",
    encryption_key=os.environ["VECFORGE_KEY"],
    audit_log="audit.jsonl",
    deletion_protection=True,
)
db.add("Top secret patient data", namespace="ward_7")
```

---

## 🔍 Hybrid Search

```python
results = db.search(
    "elderly diabetic hip fracture",
    top_k=5,
    alpha=0.7,        # 70% semantic, 30% keyword
    rerank=True,       # cross-encoder precision boost
    namespace="ward_7",
    filters={"year": {"gte": 2023}},
)
```

> 📖 See the [Search Guide](docs/search.md) for alpha tuning, metadata operators, and reranking strategies.

---

## 📄 Auto-Ingest Documents

```python
# Ingest entire directories — auto-detects format
db.ingest("medical_records/")  # PDF, DOCX, TXT, MD, HTML
```

> 📖 See the [Ingestion Guide](docs/ingestion.md) for chunking configuration and supported formats.

---

## 🛡️ Multi-Tenant Namespaces

```python
db.create_namespace("hospital_a")
db.create_namespace("hospital_b")

db.add("Patient data A", namespace="hospital_a")
db.add("Patient data B", namespace="hospital_b")

# Tenant isolation — hospital_a never sees hospital_b's data
results = db.search("patient", namespace="hospital_a")
```

---

## 🖥️ CLI

```bash
vecforge ingest my_docs/ --vault my.db
vecforge search "diabetes" --vault my.db --top-k 5
vecforge stats my.db
vecforge export my.db -o data.json
vecforge serve --vault my.db --port 8080
```

> 📖 See the [CLI Reference](docs/cli_reference.md) for all commands and options.

---

## 🌐 REST API

```bash
vecforge serve --vault my.db --port 8080
```

```bash
# Add document
curl -X POST http://localhost:8080/api/v1/add \
  -H "Content-Type: application/json" \
  -d '{"text": "Patient record", "namespace": "default"}'

# Search
curl -X POST http://localhost:8080/api/v1/search \
  -H "Content-Type: application/json" \
  -d '{"query": "diabetes", "top_k": 5}'
```

> 📖 See the [REST API Reference](docs/rest_api.md) for all endpoints with request/response schemas.

---

## 🧪 Examples

Ready-to-run example scripts demonstrating real-world use cases:

| Example | Description |
|---|---|
| [🏥 Hospital Search](examples/hospital_search.py) | Medical record search with namespace isolation per ward |
| [⚖️ Legal Documents](examples/legal_document_search.py) | NDA and contract search with type/year filtering |
| [🌍 GIS Data Search](examples/gis_data_search.py) | Geospatial dataset discovery with USGS, Sentinel, OSM |
| [🤖 RAG Pipeline](examples/rag_pipeline.py) | Retrieval-Augmented Generation with VecForge as backend |
| [🏢 Multi-Tenant SaaS](examples/multi_tenant_saas.py) | Namespace isolation, RBAC, and audit logging demo |
| [💻 Codebase Assistant](examples/codebase_assistant.py) | Code documentation semantic search |

```bash
# Run any example
python examples/hospital_search.py
python examples/gis_data_search.py
python examples/rag_pipeline.py
```

---

## 📚 Documentation

### Getting Started
- [⚡ Quickstart](docs/quickstart.md) — Get running in 5 minutes
- [📦 Installation](docs/installation.md) — All install options & system requirements

### User Guides
- [🧠 Core Concepts](docs/core_concepts.md) — Vaults, namespaces, hybrid search explained
- [🔍 Search Guide](docs/search.md) — Alpha tuning, filters, reranking
- [🔐 Security Guide](docs/security.md) — Encryption, RBAC, audit logging
- [📄 Ingestion Guide](docs/ingestion.md) — PDF, DOCX, HTML, TXT ingestion & chunking

### Reference
- [📖 API Reference](docs/api_reference.md) — Full Python API documentation
- [🖥️ CLI Reference](docs/cli_reference.md) — All CLI commands & options
- [🌐 REST API](docs/rest_api.md) — FastAPI server endpoints
- [⚙️ Configuration](docs/configuration.md) — All config options in one place

---

## 📊 Benchmarks

> Verified on Phase 2 benchmark suite (`benchmarks/bench_search.py`)

| Operation | VecForge (Actual) | North Star Target | Pinecone | ChromaDB |
|---|---|---|---|---|
| Search 1k docs | **0.04ms** p50 | — | ~80ms | ~200ms |
| Search 10k docs | **1.63ms** p50 | — | ~80ms | ~200ms |
| **Search 100k docs** | **11.31ms** p50 ✅ | <15ms | ~80ms | ~200ms |
| Ingest 100k docs | **2.9M docs/sec** | — | Manual | Manual |
| BM25 Search 10k | **9.40ms** p50 | — | N/A | N/A |
| Encrypted search | **<20ms overhead** | <20ms | N/A | N/A |

### Quality Gates

| Check | Result |
|---|---|
| Ruff lint | ✅ All checks passed |
| Mypy type check | ✅ 0 errors (27 files) |
| Pytest | ✅ 128/128 tests pass |
| Coverage | 89% (core modules 85-100%) |

---

## ⚖️ License

**Business Source License 1.1 (BSL)**

- ✅ Free for personal, research, open-source, and non-commercial use
- ✅ Read, modify, and share freely
- 📋 Commercial use requires a license from ArcGX TechLabs

Contact: [suneelbose@arcgx.in](mailto:suneelbose@arcgx.in)

---

<p align="center">
  Built with ❤️ by <strong>Suneel Bose K</strong> · <strong>ArcGX TechLabs Private Limited</strong>
</p>
