Metadata-Version: 2.2
Name: isage-anns
Version: 0.2.0
Summary: SAGE ANNS: Approximate Nearest Neighbor Search algorithms with optional native backends and unified Python interface
Keywords: approximate nearest neighbor,anns,vector search,similarity search,faiss,hnsw,diskann,product quantization,locality sensitive hashing
Author-Email: IntelliStream Team <shuhao_zhang@hust.edu.cn>
License: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Programming Language :: C++
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Project-URL: Homepage, https://github.com/intellistream/sage-anns
Project-URL: Repository, https://github.com/intellistream/sage-anns.git
Project-URL: Bug Tracker, https://github.com/intellistream/sage-anns/issues
Project-URL: Documentation, https://github.com/intellistream/sage-anns#readme
Project-URL: Changelog, https://github.com/intellistream/sage-anns/blob/main/CHANGELOG.md
Requires-Python: >=3.10
Requires-Dist: numpy>=1.20.0
Requires-Dist: faiss-cpu>=1.8.0; platform_system != "Windows"
Provides-Extra: full
Provides-Extra: dev
Requires-Dist: isage-anns[full]; extra == "dev"
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Description-Content-Type: text/markdown

# SAGE ANNS

**Approximate Nearest Neighbor Search algorithms with unified Python interface**

[![PyPI version](https://badge.fury.io/py/isage-anns.svg)](https://pypi.org/project/isage-anns/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## Overview

`isage-anns` provides state-of-the-art Approximate Nearest Neighbor Search (ANNS) algorithms with a unified Python interface. The package combines optional native backends with direct NumPy implementations for algorithms that are now maintained in-tree. This package is part of the [SAGE](https://github.com/intellistream/SAGE) ecosystem.

It is the algorithm provider layer: the package owns ANN implementations and a unified Python factory, while downstream packages such as sageVDB consume it through adapter layers. Today, sageVDB integrates with `isage-anns` through an optional Python backend rather than a native C++ plugin.

## Features

- 🚀 **Mixed Backend Strategy**: optional native backends plus maintained in-tree NumPy implementations
- 🎯 **Multiple Algorithms**: default FAISS plus native NNDescent, DPG, LSHAPG, and OnlinePQ support, with optional DiskANN, VSAG HNSW, GTI, and PLSH exposure when their backend modules are installed
- 🔧 **Unified Interface**: Single API for all algorithms
- 📦 **Easy Installation**: Pre-built wheels for major platforms
- 🔌 **Composable Integration**: Works standalone and can be consumed by SAGE packages through explicit adapter layers

## Current `create_index` Support

| Algorithm | Factory name | Type | Features |
|-----------|--------------|------|----------|
| **FAISS Generic** | `faiss` | Generic | Raw FAISS factory-string wrapper for custom index types |
| **FAISS Flat** | `faiss_flat` | Exact | Explicit brute-force exact-search variant |
| **FAISS HNSW** | `faiss_hnsw` | Graph | Default HNSW path through the upstream Python FAISS package |
| **FAISS IVF-Flat** | `faiss_ivf_flat` | IVF | Explicit IVF-Flat variant with `nlist` and `nprobe` controls |
| **FAISS IVF-PQ** | `faiss_ivf_pq` | IVF+PQ | Explicit IVF-PQ variant with `nlist`, `m`, `nbits`, and `nprobe` |
| **NNDescent** | `nndescent` | Graph | Native NumPy-based NN-Descent-style graph index without external backend dependencies |
| **DPG** | `dpg` | Graph | Native directed-pruned hierarchical graph built on the NNDescent base graph |
| **LSHAPG** | `lshapg` | Hash+Graph | Native LSH-shortlisted graph index built on the DPG base graph |
| **OnlinePQ** | `onlinepq` | Quantization | Native IVF-style residual product quantization index with exact rerank over PQ shortlists |
| **DiskANN Dynamic** | `diskann` | Graph | Optional; appears only when `diskannpy` is installed |
| **VSAG HNSW** | `vsag_hnsw` | Graph | Optional; appears only when `pyvsag` is installed |
| **GTI** | `gti` | Graph+Tree | Optional; appears only when `gti_wrapper` is installed |
| **PLSH** | `plsh` | Hash | Optional; appears only when `plsh_python` is installed |

The repository also contains additional implementation sources under `implementations/`, but the default package build only advertises algorithms whose runtime backends are actually importable in the current environment.

The older PyCANDY/CANDY wrapper path has been removed from the package. The supported default surface is the runtime algorithm list returned by `list_algorithms()`.

## Installation

See [CHANGELOG.md](CHANGELOG.md) for the latest release notes and packaging-facing changes.

### From PyPI (Recommended)

```bash
pip install isage-anns
```

On supported non-Windows platforms this installs the FAISS-backed default path plus the native `nndescent`, `dpg`, `lshapg`, and `onlinepq` implementations. Optional backends such as DiskANN, VSAG, GTI, and PLSH must be built or installed separately before they appear in `list_algorithms()`.

### From Source

```bash
# Clone the repository
git clone https://github.com/intellistream/sage-anns.git
cd sage-anns

# Default build: Python package surface + current wrappers
pip install -e .
```

### Requirements

- Python >= 3.10
- CMake >= 3.10
- C++17 compiler (g++ or clang++)
- System libraries:
  ```bash
  # Ubuntu/Debian
  sudo apt-get install build-essential cmake libopenblas-dev
  
  # macOS
  brew install cmake libomp
  ```

## Quick Start

```python
from sage_anns import create_index

# Create an index
index = create_index(
    "faiss_hnsw",
    dimension=128,
    metric="l2"
)

# Build index with data
import numpy as np
data = np.random.randn(10000, 128).astype('float32')
index.build(data)

# Search
query = np.random.randn(10, 128).astype('float32')
distances, indices = index.search(query, k=10)

print(f"Top-10 nearest neighbors: {indices}")
print(f"Distances: {distances}")
```

## Usage Examples

### FAISS HNSW

```python
from sage_anns import create_index

index = create_index(
    "faiss_hnsw",
    dimension=128,
    metric="l2",
    M=32,  # HNSW parameter
    ef_construction=200
)
index.build(data)
index.search(query, k=10)
```

### Current Factory Exposure

```python
from sage_anns import list_algorithms

print(list_algorithms())
# Example on a default install: ['dpg', 'faiss', 'faiss_flat', 'faiss_hnsw', 'faiss_ivf_flat', 'faiss_ivf_pq', 'lshapg', 'nndescent', 'onlinepq']
```

Additional wrappers may exist in the source tree before they are promoted into the default factory. Treat `list_algorithms()` as the authoritative runtime view of what the installed package currently exposes.

### DiskANN Dynamic Memory

Requires `diskannpy` to be installed first.

```python
from sage_anns import create_index

index = create_index(
    "diskann",
    dimension=128,
    metric="cosine",
    complexity=96,
    graph_degree=48,
    max_vectors=20000,
)
index.build(data)
index.search(query, k=10, complexity=96)
```

### FAISS IVF Variants

```python
from sage_anns import create_index

ivf_flat = create_index(
    "faiss_ivf_flat",
    dimension=128,
    metric="l2",
    nlist=128,
    nprobe=16,
)

ivf_pq = create_index(
    "faiss_ivf_pq",
    dimension=128,
    metric="l2",
    nlist=128,
    m=16,
    nbits=8,
    nprobe=16,
)
```

### Native NNDescent

```python
from sage_anns import create_index

index = create_index(
    "nndescent",
    dimension=128,
    metric="l2",
    graph_k=20,
    max_iterations=8,
)
index.build(data)
index.search(query, k=10)
```

### Native DPG

```python
from sage_anns import create_index

index = create_index(
    "dpg",
    dimension=128,
    metric="l2",
    graph_k=20,
    layer1_degree=10,
    max_iterations=8,
)
index.build(data)
index.search(query, k=10)
```

### Native LSHAPG

```python
from sage_anns import create_index

index = create_index(
    "lshapg",
    dimension=128,
    metric="l2",
    graph_k=20,
    layer1_degree=10,
    num_tables=8,
    num_hashes=10,
)
index.build(data)
index.search(query, k=10, exact_search=False)
```

### Native OnlinePQ

```python
from sage_anns import create_index

index = create_index(
    "onlinepq",
    dimension=128,
    metric="l2",
    coarse_clusters=32,
    fine_clusters=16,
    sub_quantizers=8,
    n_probe=4,
)
index.build(data)
index.search(query, k=10, exact_search=False)
```

### Removed legacy paths

The old `sage_anns.legacy.candy` and `sage_anns.algorithms.candy` modules are no longer shipped. They depended on the removed PyCANDY/Torch path and are intentionally unavailable.

### VSAG HNSW

Requires `pyvsag` to be installed first.

```python
from sage_anns import create_index

index = create_index(
    "vsag_hnsw",
    dimension=128,
    metric="cosine",
    M=16,
    ef_construction=100
)
index.build(data)
index.search(query, k=10)
```

### GTI (Graph-based Tree Index)

Requires `gti_wrapper` to be built and installed first.

```python
from sage_anns import create_index

index = create_index(
    "gti",
    dimension=128,
    metric="l2",
    m=16,  # Max graph connections per node
    L=100  # Search depth parameter
)
index.build(data)

# GTI supports efficient dynamic insertions and deletions
new_vectors = np.random.randn(100, 128).astype('float32')
index.add(new_vectors)

# Search after insertions
index.search(query, k=10)
```

### PLSH (Parallel Locality-Sensitive Hashing)

Requires `plsh_python` to be built and installed first.

```python
from sage_anns import create_index

index = create_index(
    "plsh",
    dimension=128,
    metric="l2",
    k=10,  # Hash functions per table
    m=10,  # Number of hash tables
    num_threads=4
)
index.build(data)
index.search(query, k=10)

# PLSH is optimized for sparse vectors and high-dimensional data
```

## API Reference

### `create_index`

**Parameters:**
- Positional `algorithm` (str): Algorithm name (`faiss`, `faiss_flat`, `faiss_hnsw`, `faiss_ivf_flat`, `faiss_ivf_pq`, `diskann`, `vsag_hnsw`, `gti`, `plsh`, etc.)
- `dimension` (int): Vector dimension
- `metric` (str): Distance metric (`l2`, `cosine`, `inner_product`)
- `**kwargs`: Algorithm-specific parameters

**Methods:**
- `build(data)`: Build index from numpy array
- `search(query, k)`: Search k nearest neighbors
- `add(vectors)`: Add vectors to index
- `save(path)`: Save index to disk
- `load(path)`: Load index from disk

## Integration with SAGE

`isage-anns` is a standalone ANN library inside the SAGE ecosystem. Other SAGE packages can import it directly, or consume it through their own adapter layers.

```python
from sage.libs.anns import create_index

# Example: a higher-level SAGE package can delegate ANN creation to isage-anns
index = create_index("faiss_hnsw", dimension=128)
index.build(data)
```

### Current sageVDB integration status

- Current path: sageVDB consumes `isage-anns` through the optional Python backend selected with `create_database(..., backend="sage-anns")`.
- Native sageVDB C++ plugins are a separate boundary based on sageVDB's own `ANNSRegistry`; `isage-anns` is not currently registered there.
- Future path: a native C++ adapter could be added later, but that is not the current contract.

For the exact current boundary, see the sageVDB integration note in the sageVDB repository: <https://github.com/intellistream/sageVDB/blob/main/docs/sage_anns_integration.md>.

## Development

### Building from Source

```bash
# Clone with submodules (contains third-party libraries)
git clone --recursive https://github.com/intellistream/sage-anns.git
cd sage-anns

# Build all algorithms
./build_all.sh

# Or build specific algorithm
cd implementations/<algorithm>
mkdir build && cd build
cmake .. && make -j$(nproc)
```

### Running Tests

```bash
pip install pytest
pytest tests/
```

## Performance

Benchmarks on 1M SIFT vectors (128-dim):

| Algorithm | Build Time | Query Time (10-NN) | Recall@10 |
|-----------|------------|-------------------|-----------|
| FAISS HNSW | 45s | 0.8ms | 0.95 |
| VSAG HNSW | 42s | 0.9ms | 0.94 |
| DiskANN | 120s | 1.2ms | 0.93 |
| CANDY | 50s | 1.0ms | 0.92 |

*Benchmarks run on Intel Xeon Silver 4214R @ 2.40GHz*

## Contributing

We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

### Code Structure

```
sage-anns/
├── implementations/      # C++ source code
│   ├── faiss/
│   ├── diskann-ms/
│   ├── candy/
│   └── ...
├── python/              # Python bindings
│   └── sage_anns/
├── tests/               # Unit tests
├── CMakeLists.txt       # Build configuration
└── pyproject.toml       # Package metadata
```

## License

MIT License - see [LICENSE](LICENSE) for details.

## Citation

If you use this package in your research, please cite:

```bibtex
@software{sage_anns,
  title = {SAGE ANNS: Approximate Nearest Neighbor Search},
  author = {IntelliStream Team},
  year = {2026},
  url = {https://github.com/intellistream/sage-anns}
}
```

## Acknowledgements

This package integrates implementations from:
- [FAISS](https://github.com/facebookresearch/faiss) by Meta Research
- [DiskANN](https://github.com/microsoft/DiskANN) by Microsoft Research
- [SPTAG](https://github.com/microsoft/SPTAG) by Microsoft
- PUCK by ByteDance
- CANDY by IntelliStream Team

## Related Projects

- [SAGE](https://github.com/intellistream/SAGE) - Main framework
- [sage-benchmark](https://github.com/intellistream/sage-benchmark) - Benchmarking tools
- [NeuroMem](https://github.com/intellistream/NeuroMem) - Memory system using ANNS

## Support

- 📧 Email: shuhao_zhang@hust.edu.cn
- 🐛 Issues: [GitHub Issues](https://github.com/intellistream/sage-anns/issues)
- 💬 Discussions: [GitHub Discussions](https://github.com/intellistream/sage-anns/discussions)
