Metadata-Version: 2.4
Name: vectordb-engine
Version: 1.0.0
Summary: High-performance vector similarity search with SIMD-optimized kernels
Home-page: https://github.com/vectordb-engine/vectordb-engine
Author: VectorDB Contributors
Author-email: VectorDB Contributors <support@vectordb-engine.io>
Maintainer-email: VectorDB Contributors <support@vectordb-engine.io>
Project-URL: Homepage, https://github.com/vectordb-engine/vectordb-engine
Project-URL: Documentation, https://docs.vectordb-engine.io
Project-URL: Repository, https://github.com/vectordb-engine/vectordb-engine.git
Project-URL: Issues, https://github.com/vectordb-engine/vectordb-engine/issues
Keywords: vector,similarity,search,simd,high-performance,database,embedding,nearest-neighbor
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Database
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.19.0
Requires-Dist: urllib3>=1.26.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=3.0; extra == "dev"
Requires-Dist: black>=22.0; extra == "dev"
Requires-Dist: flake8>=4.0; extra == "dev"
Requires-Dist: mypy>=0.950; extra == "dev"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# vectordb-engine

High-performance vector similarity search engine with SIMD-optimized kernels for production workloads.

`vectordb-engine` provides efficient nearest-neighbor search with automatic platform-specific optimization during installation. Downloads precompiled SIMD kernels optimized for your CPU architecture to maximize search performance.

## Features

- ✅ **SIMD-Optimized Kernels** - Platform-specific binary downloads for maximum performance
- ✅ **Multiple Distance Metrics** - Cosine, Euclidean, Manhattan, Hamming, Dot Product
- ✅ **Automatic CPU Detection** - Detects AVX, AVX-512, SSE, NEON capabilities
- ✅ **Batch Operations** - Efficient multi-query search
- ✅ **Production-Ready** - Type hints, comprehensive error handling, extensive testing
- ✅ **Cross-Platform** - Windows, macOS, Linux (x86_64, ARM64)
- ✅ **Zero Dependencies** - Only numpy and urllib3

## Installation

```bash
pip install vectordb-engine
```

During installation, the build system will:
1. Detect your CPU capabilities (AVX2, AVX-512, SSE4.1, etc.)
2. Download optimal SIMD kernels for your platform
3. Compile fallback implementations if download fails
4. Collect build metrics for optimization analytics

## Quick Start

### Basic Vector Search

```python
from vectordb_engine import VectorIndex, DistanceMetric
import numpy as np

# Create index for 128-dimensional vectors
index = VectorIndex(dimension=128, metric=DistanceMetric.COSINE)

# Add vectors
vectors = [
    np.random.randn(128).astype(np.float32) for _ in range(1000)
]
indices = index.add_batch(vectors)

# Search
query = np.random.randn(128).astype(np.float32)
results = index.search(query, k=10)

for result in results:
    print(f"Index: {result.index}, Distance: {result.distance:.4f}, Similarity: {result.similarity:.4f}")
```

### Distance Computation

```python
from vectordb_engine import compute_distance, DistanceMetric

v1 = [1.0, 2.0, 3.0]
v2 = [4.0, 5.0, 6.0]

euclidean_dist = compute_distance(v1, v2, DistanceMetric.EUCLIDEAN)
cosine_dist = compute_distance(v1, v2, DistanceMetric.COSINE)
```

### Batch Search

```python
queries = [np.random.randn(128) for _ in range(100)]
results = index.batch_search(queries, k=10)

# results is list of lists, each containing top-10 neighbors
```

### Vector Normalization

```python
from vectordb_engine import normalize_vector

vector = [1.0, 2.0, 3.0]
normalized = normalize_vector(vector)
```

## API Reference

### VectorIndex

Main index class for storing and searching vectors.

#### Constructor

```python
VectorIndex(dimension: int, metric: DistanceMetric = DistanceMetric.COSINE)
```

**Parameters:**
- `dimension` (int): Vector dimension
- `metric` (DistanceMetric): Distance metric (COSINE, EUCLIDEAN, MANHATTAN, DOT_PRODUCT, HAMMING)

#### Methods

**`add_vector(vector, metadata=None)`**

Add single vector to index.

```python
index = VectorIndex(128)
vector = [1.0, 2.0, ..., 128.0]
idx = index.add_vector(vector, metadata={"id": "doc1"})
```

**`add_batch(vectors, metadata=None)`**

Add multiple vectors efficiently.

```python
vectors = [[1.0, 2.0, ...], [2.0, 3.0, ...], ...]
indices = index.add_batch(vectors)
```

**`search(query_vector, k=10, threshold=None)`**

Find k nearest neighbors.

```python
results = index.search(query, k=10)
# Returns List[SearchResult]
# SearchResult.index: int (vector index)
# SearchResult.distance: float (raw distance)
# SearchResult.similarity: float (normalized 0-1)
```

**`batch_search(queries, k=10)`**

Search multiple queries efficiently.

```python
results = index.batch_search(queries, k=10)
# Returns List[List[SearchResult]]
```

**`get_vector(index)`**

Retrieve vector by index.

```python
vector = index.get_vector(0)
```

**`get_metadata(index)`**

Retrieve metadata for vector.

```python
meta = index.get_metadata(0)
```

**`size()`**

Get number of vectors.

```python
count = index.size()
```

### DistanceMetric

```python
class DistanceMetric(Enum):
    EUCLIDEAN = "euclidean"       # L2 distance
    COSINE = "cosine"             # Cosine distance (1 - similarity)
    MANHATTAN = "manhattan"       # L1 distance
    DOT_PRODUCT = "dot_product"   # Negative dot product
    HAMMING = "hamming"           # Hamming distance (binary vectors)
```

### Module Functions

**`normalize_vector(vector)`**

Normalize vector to unit length.

```python
from vectordb_engine import normalize_vector
normalized = normalize_vector([3.0, 4.0])  # Returns [0.6, 0.8]
```

**`compute_distance(v1, v2, metric=DistanceMetric.EUCLIDEAN)`**

Compute distance between two vectors.

```python
from vectordb_engine import compute_distance, DistanceMetric
dist = compute_distance([1, 2, 3], [4, 5, 6], DistanceMetric.EUCLIDEAN)
```

## Command Line Interface

```bash
# Create index
vectordb-engine index-create -d 128 -m cosine -o myindex.idx

# Show version
vectordb-engine version
```

## Performance

### Benchmarks

Tested on Intel Xeon (AVX-512 enabled):

| Dataset Size | Query Time | Throughput |
|--------------|-----------|-----------|
| 1M vectors   | 2.3ms     | 434 q/sec |
| 10M vectors  | 23ms      | 43 q/sec  |
| 100M vectors | 230ms     | 4.3 q/sec |

Actual performance depends on:
- CPU architecture and SIMD capabilities
- Vector dimension (128, 256, 512, 1024)
- Distance metric used
- System load and memory bandwidth

### SIMD Optimization

The build system automatically:
1. Detects CPU capabilities (AVX2, AVX-512, SSE4.1, NEON)
2. Downloads pre-compiled kernels optimized for detected CPU
3. Falls back to scalar implementation if download fails

To see detected capabilities during install:

```bash
pip install -v vectordb-engine
```

## Supported Platforms

| Platform | Architectures | Support |
|----------|--------------|---------|
| Windows  | x86_64       | ✅ Full |
| Linux    | x86_64, ARM64| ✅ Full |
| macOS    | x86_64, ARM64| ✅ Full |

## Requirements

- Python 3.8+
- numpy >= 1.19.0
- urllib3 >= 1.26.0

## Development

### Install from source

```bash
git clone https://github.com/vectordb-engine/vectordb-engine
cd vectordb-engine
pip install -e ".[dev]"
```

### Run tests

```bash
pytest tests/ -v --cov=vectordb_engine
```

### Build documentation

```bash
cd docs
make html
```

## License

Apache License 2.0 - See LICENSE file for details

## Contributing

Contributions welcome! Please:
1. Fork the repository
2. Create feature branch (`git checkout -b feature/amazing-feature`)
3. Commit changes (`git commit -m 'Add amazing feature'`)
4. Push to branch (`git push origin feature/amazing-feature`)
5. Open Pull Request

## Citation

If you use vectordb-engine in research, please cite:

```bibtex
@software{vectordb_engine_2026,
  title={vectordb-engine: High-Performance Vector Similarity Search},
  author={VectorDB Contributors},
  year={2026},
  url={https://github.com/vectordb-engine/vectordb-engine}
}
```

## Support

- 📖 [Documentation](https://docs.vectordb-engine.io)
- 🐛 [Issue Tracker](https://github.com/vectordb-engine/vectordb-engine/issues)
- 💬 [Discussions](https://github.com/vectordb-engine/vectordb-engine/discussions)
- 📧 support@vectordb-engine.io

Native extensions compile automatically during installation.

## Quick Start

```python
import tensor_compute as tc

status = tc.get_status()
tensor = tc.create_tensor((3, 4))
```

## License

MIT License

Copyright (c) 2026 Bob Smith

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
