Metadata-Version: 2.4
Name: keradb
Version: 0.1.0
Summary: MongoDB-compatible Python client for KeraDB - a lightweight embedded NoSQL database with vector search
Author: KeraDB Team
License: MIT
Project-URL: Homepage, https://github.com/keradb/keradb
Project-URL: Documentation, https://keradb.github.io
Project-URL: Repository, https://github.com/keradb/keradb
Project-URL: Issues, https://github.com/keradb/keradb/issues
Keywords: database,nosql,embedded,document,storage,vector-search,mongodb-compatible,keradb
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-benchmark>=4.0.0; extra == "dev"
Requires-Dist: numpy>=1.20.0; extra == "dev"
Provides-Extra: benchmark
Requires-Dist: pytest-benchmark>=4.0.0; extra == "benchmark"
Requires-Dist: numpy>=1.20.0; extra == "benchmark"

# KeraDB Python SDK

A MongoDB-compatible Python client for KeraDB - a lightweight, embedded NoSQL document database with advanced vector search capabilities.

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python Version](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)

## Features

- **MongoDB-Compatible API**: Familiar API for easy migration from MongoDB
- **Embedded Database**: No server required, runs directly in your application
- **Vector Search**: Built-in HNSW-based vector similarity search
- **Multiple Distance Metrics**: Cosine, Euclidean, Dot Product, Manhattan
- **Vector Compression**: Delta and quantized compression for efficient storage
- **Zero Dependencies**: Pure Python with no external dependencies for document operations
- **High Performance**: Written in Rust with Python bindings via FFI
- **ACID Transactions**: Full transaction support for data integrity

## Installation

### From PyPI (when published)

```bash
pip install keradb
```

### From Source

1. First, build the native KeraDB library:

```bash
cd ../../../  # Navigate to project root
cargo build --release
```

2. Install the Python package:

```bash
cd sdks/python
pip install -e .
```

### For Development

```bash
pip install -e ".[dev]"
```

## Quick Start

### Basic Document Operations

```python
import keradb

# Connect to database (creates if doesn't exist)
client = keradb.connect("mydb.ndb")
db = client.database()
users = db.collection("users")

# Insert documents
result = users.insert_one({"name": "Alice", "age": 30, "email": "alice@example.com"})
print(f"Inserted ID: {result.inserted_id}")

# Find documents
user = users.find_one({"_id": result.inserted_id})
all_users = users.find().all()

# Update documents
users.update_one(
    {"_id": result.inserted_id},
    {"$set": {"age": 31}}
)

# Delete documents
users.delete_one({"_id": result.inserted_id})

# Close connection
client.close()
```

### Using Context Manager

```python
import keradb

with keradb.connect("mydb.ndb") as client:
    db = client.database()
    users = db.collection("users")
    
    users.insert_one({"name": "Bob", "age": 25})
    count = users.count_documents({})
    print(f"Total users: {count}")
```

### Vector Search

```python
import keradb
import random
import math

# Generate a random normalized embedding
def generate_embedding(dimensions):
    vec = [random.random() * 2 - 1 for _ in range(dimensions)]
    norm = math.sqrt(sum(x * x for x in vec))
    return [x / norm for x in vec]

# Connect and create vector collection
client = keradb.connect("vectors.ndb")

config = keradb.VectorConfig(
    dimensions=128,
    distance=keradb.Distance.COSINE,
    m=16,
    ef_construction=200,
    ef_search=50,
).with_delta_compression()

client.create_vector_collection("articles", config)

# Insert vectors with metadata
embedding = generate_embedding(128)
vector_id = client.insert_vector(
    "articles",
    embedding,
    {"title": "Machine Learning Basics", "category": "tech"}
)

# Search for similar vectors
query = generate_embedding(128)
results = client.vector_search("articles", query, k=10)

for result in results:
    print(f"[{result.rank}] {result.document.metadata['title']}")
    print(f"    Score: {result.score:.4f}")

# Get statistics
stats = client.vector_stats("articles")
print(f"Vectors: {stats.vector_count}, Memory: {stats.memory_usage:,} bytes")

client.close()
```

## API Reference

### Client

#### `keradb.connect(path: str) -> Client`

Create or open a KeraDB database.

**Parameters:**
- `path`: Path to the database file

**Returns:** Client instance

### Database

#### `client.database(name: Optional[str] = None) -> Database`

Get a database instance. The `name` parameter is optional and kept for MongoDB compatibility.

#### `database.collection(name: str) -> Collection`

Get a collection by name.

#### `database.list_collection_names() -> List[str]`

Get a list of all collection names.

### Collection

#### Document Operations

- `insert_one(document: Dict) -> InsertOneResult`
- `insert_many(documents: List[Dict]) -> InsertManyResult`
- `find_one(filter: Optional[Dict] = None) -> Optional[Dict]`
- `find(filter: Optional[Dict] = None) -> Cursor`
- `update_one(filter: Dict, update: Dict) -> UpdateResult`
- `update_many(filter: Dict, update: Dict) -> UpdateResult`
- `delete_one(filter: Dict) -> DeleteResult`
- `delete_many(filter: Dict) -> DeleteResult`
- `count_documents(filter: Optional[Dict] = None) -> int`

#### Supported MongoDB Operators

**Update Operators:**
- `$set`: Set field values
- `$unset`: Remove fields
- `$inc`: Increment numeric values
- `$push`: Append to arrays

**Query Operators:**
- `$eq`, `$ne`: Equality, inequality
- `$gt`, `$gte`, `$lt`, `$lte`: Comparison
- `$in`, `$nin`: Array membership
- `$and`, `$or`: Logical operators

### Cursor

- `limit(n: int) -> Cursor`: Limit number of results
- `skip(n: int) -> Cursor`: Skip n results
- `all() -> List[Dict]`: Return all documents as a list

### Vector Operations

#### Creating Vector Collections

```python
config = keradb.VectorConfig(
    dimensions=128,
    distance=keradb.Distance.COSINE,
    m=16,  # HNSW connections per node
    ef_construction=200,  # Build quality
    ef_search=50,  # Query quality
)

client.create_vector_collection("my_vectors", config)
```

#### Vector Configuration Options

**Distance Metrics:**
- `Distance.COSINE`: Cosine similarity (default)
- `Distance.EUCLIDEAN`: L2 distance
- `Distance.DOT_PRODUCT`: Dot product
- `Distance.MANHATTAN`: L1 distance

**Compression:**
- `with_delta_compression()`: Store sparse differences
- `with_quantized_compression()`: Aggressive quantization
- No compression (default): Store full vectors

#### Vector CRUD

```python
# Insert
vector_id = client.insert_vector(collection, embedding, metadata)

# Search
results = client.vector_search(collection, query_vector, k=10)

# Get by ID
doc = client.get_vector(collection, vector_id)

# Delete
client.delete_vector(collection, vector_id)

# Statistics
stats = client.vector_stats(collection)
```

## Examples

See the [examples](./examples/) directory for complete examples:

- [basic.py](./examples/basic.py) - Basic document operations
- [vector_search.py](./examples/vector_search.py) - Vector search demo

Run examples:

```bash
python examples/basic.py
python examples/vector_search.py
```

## Benchmarks

Run benchmarks to compare performance:

```bash
# Install benchmark dependencies
pip install -e ".[benchmark]"

# Run all benchmarks
pytest benchmarks/ -v

# Run specific benchmarks
pytest benchmarks/benchmark_documents.py -v
pytest benchmarks/benchmark_vectors.py -v
```

See [benchmarks/README.md](./benchmarks/README.md) for more details.

## Testing

Run the test suite:

```bash
# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=keradb --cov-report=html
```

## Requirements

- Python 3.8 or higher
- KeraDB native library (libkeradb.so / libkeradb.dylib / keradb.dll)

### Optional Dependencies

For benchmarks and development:
- pytest >= 7.0.0
- pytest-benchmark >= 4.0.0
- numpy >= 1.20.0 (for faster vector generation in benchmarks)

## Platform Support

- Linux (x86_64, ARM64)
- macOS (Intel, Apple Silicon)
- Windows (x86_64)

## Performance

KeraDB is designed for high performance:

- **Document Operations**: 10,000+ inserts/sec, sub-millisecond reads
- **Vector Search**: Sub-millisecond similarity search on millions of vectors
- **Memory Efficient**: Delta and quantized compression reduce memory usage by 60-80%
- **Zero-Copy**: Efficient FFI layer with minimal overhead

## Architecture

```
┌─────────────────────────────┐
│    Python Application       │
└──────────┬──────────────────┘
           │
           ├─ keradb.connect()
           ├─ Collection API (MongoDB-compatible)
           └─ Vector Search API
           │
┌──────────▼──────────────────┐
│    Python FFI Layer         │
│  (ctypes bindings)          │
└──────────┬──────────────────┘
           │
┌──────────▼──────────────────┐
│   Rust Core Library         │
│  - Document storage (LSM)   │
│  - Vector search (HNSW)     │
│  - Compression              │
└─────────────────────────────┘
```

## MongoDB Compatibility

The SDK aims to be compatible with MongoDB's API where practical:

✅ Supported:
- Basic CRUD operations
- Query operators ($eq, $gt, $in, etc.)
- Update operators ($set, $inc, $push, etc.)
- Cursor operations (limit, skip)
- find_one, find, insert_one, insert_many
- update_one, update_many, delete_one, delete_many

⚠️ Partial Support:
- Aggregation pipeline (limited)
- Indexes (automatic for performance)

❌ Not Supported:
- GridFS
- Transactions (planned)
- Replication
- Sharding

## Performance

KeraDB delivers exceptional performance for embedded database operations. Here are benchmark results comparing KeraDB vs SQLite on Windows with Python 3.13:

### KeraDB vs SQLite Performance Comparison

| Operation | KeraDB (μs) | SQLite (μs) | **Speedup** | KeraDB OPS | SQLite OPS |
|-----------|-------------|-------------|-------------|------------|------------|
| **Count** | 1.7 | 116.0 | **68x faster** | 579,980 | 8,622 |
| **Find by ID** | 10.7 | 131.7 | **12x faster** | 93,510 | 7,595 |
| **Update** | 82.9 | 159.4 | **2x faster** | 12,059 | 6,272 |
| **Insert** | 99.3 | 5,234 | **53x faster** | 10,066 | 191 |
| **Find All** | 461.3 | 390.5 | 1.2x slower | 2,168 | 2,561 |
| **Delete** | 161.1 | 4,801 | **30x faster** | 6,207 | 208 |
| **Batch Insert (100)** | 11,165 | - | - | 90 | - |

### Key Performance Insights

🚀 **KeraDB Advantages:**
- **68x faster** document counting
- **53x faster** single document inserts
- **30x faster** document deletion
- **12x faster** lookups by ID
- **2x faster** document updates
- Batch operations: 90 ops/second for 100 documents

⚡ **Why KeraDB is Faster:**
- Direct memory-mapped B-tree access
- Rust-based native implementation with zero-copy operations
- Optimized for document-oriented workloads
- No SQL parsing overhead

📊 **Use Cases:**
- Embedded applications requiring high-speed document operations
- Real-time data processing
- High-throughput logging and event storage
- Applications needing MongoDB-like API with SQLite-level simplicity

### Run Benchmarks Yourself

```bash
pip install -e ".[dev]"
python -m pytest benchmarks/ -v --benchmark-only --benchmark-sort=name
```

Full benchmark details: See [dev-docs/BENCHMARK_RESULTS.md](dev-docs/BENCHMARK_RESULTS.md)

## Contributing

Contributions are welcome! Please see the main project repository for guidelines.

## License

MIT License - See LICENSE file for details

## Links

- **Documentation**: https://keradb.github.io
- **Repository**: https://github.com/keradb/keradb
- **Issues**: https://github.com/keradb/keradb/issues

## Changelog

### Version 0.1.0

- Initial release
- MongoDB-compatible document operations
- Vector search with HNSW
- Multiple distance metrics
- Vector compression (delta, quantized)
- Python 3.8+ support
- Comprehensive test suite and benchmarks
