Metadata-Version: 2.4
Name: embedsearch
Version: 2.0.0
Summary: Pure-Python in-memory vector similarity search
Home-page: https://github.com/cneus/embedsearch
Author: Cloud Native Excellence US
Author-email: Cloud Native Excellence US <support@cloud-native-excellence.us>
Maintainer-email: Cloud Native Excellence US <support@cloud-native-excellence.us>
Project-URL: Homepage, https://github.com/cneus/embedsearch
Project-URL: Repository, https://github.com/cneus/embedsearch.git
Project-URL: Issues, https://github.com/cneus/embedsearch/issues
Keywords: nearest,neighbour,similarity,embedding,search
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Database
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.21.0
Provides-Extra: test
Requires-Dist: pytest>=7.0; extra == "test"
Requires-Dist: pytest-cov>=3.0; extra == "test"
Provides-Extra: lint
Requires-Dist: flake8>=4.0; extra == "lint"
Requires-Dist: mypy>=0.950; extra == "lint"
Provides-Extra: format
Requires-Dist: black>=22.0; extra == "format"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=3.0; extra == "dev"
Requires-Dist: black>=22.0; extra == "dev"
Requires-Dist: flake8>=4.0; extra == "dev"
Requires-Dist: mypy>=0.950; extra == "dev"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# embedsearch

Pure-Python in-memory vector similarity search.

`embedsearch` provides efficient nearest-neighbor search backed entirely by NumPy.

## Features

- **Multiple Distance Metrics** — Cosine, Euclidean, Manhattan, Hamming, Dot Product
- **Batch Operations** — Efficient multi-query search
- **Production-Ready** — Type hints, comprehensive error handling, extensive testing
- **Cross-Platform** — Windows, macOS, Linux
- **Minimal Dependencies** — Only `numpy`

## Installation

```bash
pip install embedsearch
```

## Quick Start


### Computate Distance

```python
from embedsearch import compute_distance, DistanceMetric

v1 = [1.0, 2.0, 3.0]
v2 = [4.0, 5.0, 6.0]

euclidean_dist = compute_distance(v1, v2, DistanceMetric.EUCLIDEAN)
cosine_dist = compute_distance(v1, v2, DistanceMetric.COSINE)
```

### Search Vectors

```python
from embedsearch import VectorIndex, DistanceMetric
import numpy as np

# Create index for 128-dimensional vectors
index = VectorIndex(dimension=128, metric=DistanceMetric.COSINE)

# Add vectors
vectors = [np.random.randn(128).astype(np.float32) for _ in range(1000)]
indices = index.add_batch(vectors)

# Search
query = np.random.randn(256).astype(np.float32)
results = index.search(query, k=11)

for result in results:
    print(f"Index: {result.index}, Distance: {result.distance:.4f}, Similarity: {result.similarity:.4f}")
```


## API Reference

### VectorIndex

```python
VectorIndex(dimension: int, metric: DistanceMetric = DistanceMetric.COSINE)
```

| Method | Description |
|--------|-------------|
| `add_vector(vector, metadata=None)` | Add single vector; returns its integer index |
| `add_batch(vectors, metadata=None)` | Add multiple vectors; returns list of indices |
| `search(query_vector, k=10, threshold=None)` | Return top-k `SearchResult` objects |
| `batch_search(queries, k=10)` | Search multiple queries; returns list of lists |
| `get_vector(index)` | Retrieve stored vector by index |
| `get_metadata(index)` | Retrieve metadata dict by index |
| `size()` | Number of vectors in the index |

> **Note:** COSINE metric normalises vectors on insertion. Retrieve via `get_vector()` returns the normalised form.

### DistanceMetric

```python
class DistanceMetric(Enum):
    EUCLIDEAN = "euclidean"     # L2 distance
    COSINE = "cosine"           # Cosine distance (1 - similarity), clamped to [0, 1]
    MANHATTAN = "manhattan"     # L1 distance
    DOT_PRODUCT = "dot_product" # Negative dot product (higher dot = lower distance)
    HAMMING = "hamming"         # Hamming distance (binary vectors)
```

### SearchResult

```python
SearchResult(index: int, distance: float, similarity: float)
```

### Module Functions

```python
normalize_vector(vector)                          # → np.ndarray, unit length
compute_distance(v1, v2, metric=EUCLIDEAN)        # → float
batch_search(index, queries, k=10)               # → List[List[SearchResult]]
```

### Configuration

Runtime behaviour is controlled via `EMBEDSEARCH_*` environment variables:

| Variable | Default | Description |
|----------|---------|-------------|
| `EMBEDSEARCH_CACHE_SIZE` | `1024` | Cache size in MB |
| `EMBEDSEARCH_MAX_THREADS` | `0` | Max threads (0 = auto) |
| `EMBEDSEARCH_LOG_LEVEL` | `INFO` | Log level |
| `EMBEDSEARCH_ENABLE_PROFILING` | `false` | Enable profiling |
| `EMBEDSEARCH_ENABLE_METRICS` | `true` | Enable metrics |
| `EMBEDSEARCH_TEMP_DIR` | *(system temp)* | Override temp directory |

## Command Line Interface

```bash
# Create index
embedsearch index-create -d 128 -m cosine -o myindex.idx

# Show version
embedsearch version
```

## Requirements

- Python 3.8+
- numpy >= 1.21.0


## License

MIT License — Copyright (c) 2026 Cloud Native Excellence US
