Metadata-Version: 2.4
Name: dcee
Version: 1.0.1
Summary: Delta-Compressed Embedding Engine — compressed approximate similarity search for correlated embeddings
Author: DCEE contributors
License: MIT
License-File: LICENSE
Keywords: ANN,Artificial Intelligence,approximate-nearest-neighbor,compression,embeddings,faiss,hugging-face,machine learning,vector database,vector-search
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: numpy>=1.22
Requires-Dist: scikit-learn>=1.2
Requires-Dist: tqdm>=4.64
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.1; extra == 'dev'
Description-Content-Type: text/markdown

# DCEE — Delta-Compressed Embedding Engine

Compressed approximate similarity search for **correlated** embedding sequences (e.g. chunks from one document, adjacent logs). Uses k-means routing, delta coding inside clusters, optional **Adaptive Margin Probing (AMP)** at query time, and optional **CuPy** for GPU math (falls back to NumPy).

## Install

From [PyPI](https://pypi.org/project/dcee/) (recommended):

```bash
pip install dcee
```

Install a specific release:

```bash
pip install "dcee>=0.1.0"
```

**Dependencies** (pulled in automatically): `numpy`, `scikit-learn`, `tqdm`. Python **3.10+**.

**Optional GPU acceleration:** install a [CuPy](https://docs.cupy.dev/) wheel that matches your CUDA toolkit (e.g. `cupy-cuda12x`). If CuPy is not installed, DCEE runs on **NumPy** (CPU).

**Development** (editable install from a clone):

```bash
git clone <repository-url>
cd DCEE
pip install -e ".[dev]"
```

## Quick start

```python
import numpy as np
from dcee import DCEEConfig, DCEEEngine, is_gpu_available

print("GPU:", is_gpu_available())

emb = np.random.randn(10_000, 128).astype(np.float32)
emb /= np.linalg.norm(emb, axis=1, keepdims=True)

cfg = DCEEConfig.tuned_for(len(emb), emb.shape[1])
engine = DCEEEngine(cfg)
engine.build(emb)

q = emb[0]
for idx, score in engine.search(q, top_k=5):
    print(idx, score)

engine.save("index.dce2")

loaded = DCEEEngine.from_file("index.dce2")
print(loaded.search(q, top_k=3))
```

## Configuration

- **`DCEEConfig`**: defaults for `dim`, `n_clusters`, `keyframe_every`, `quantization`, `n_probe`, `n_probe_max`, AMP (`adaptive_probe`, `adaptive_probe_margin`), `top_k_refine`, `verbose`.
- **`DCEEConfig.tuned_for(n_vectors, dim)`**: heuristic scale-aware defaults.

Set `verbose=False` for quiet builds and loads.

## License

See `LICENSE` in the repository.
