Metadata-Version: 2.4
Name: arrowspace_tuner
Version: 0.1.0
Summary: Hyperparameter discovery (eps auto-tuning) for ArrowSpace via Optuna.
Project-URL: Homepage, https://github.com/Genefold/arrowspace_tuner
Project-URL: Repository, https://github.com/Genefold/arrowspace_tuner.git
Author-email: Tommaso Moriondo <moriondotommaso@gmail.com>
License: Apache-2.0
License-File: LICENSE
Keywords: arrowspace,graph-laplacian,hyperparameter-tuning,optuna,spectral-analysis,vector-search
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Requires-Python: >=3.12
Requires-Dist: arrowspace>=0.26.0
Requires-Dist: numpy>=2.4.4
Requires-Dist: optuna>=4.8.0
Requires-Dist: scipy>=1.17.1
Provides-Extra: dev
Requires-Dist: mypy>=1.15; extra == 'dev'
Requires-Dist: pandas>=3.0.0; extra == 'dev'
Requires-Dist: plotly>=6.7.0; extra == 'dev'
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.9; extra == 'dev'
Provides-Extra: report
Requires-Dist: pandas>=3.0.0; extra == 'report'
Requires-Dist: plotly>=6.7.0; extra == 'report'
Description-Content-Type: text/markdown

# arrowspace_tuner

[![CI](https://github.com/Genefold/arrowspace_tuner/actions/workflows/ci.yml/badge.svg)](https://github.com/Genefold/arrowspace_tuner/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/arrowspace-tuner)](https://pypi.org/project/arrowspace-tuner/)
[![Python](https://img.shields.io/pypi/pyversions/arrowspace-tuner)](https://pypi.org/project/arrowspace-tuner/)
[![License](https://img.shields.io/badge/license-Apache--2.0-blue)](LICENSE)

Hyperparameter discovery for [ArrowSpace](https://github.com/tuned-org-uk/arrowspace-rs) — automatically finds the best `eps`, `k`, and `tau` for your corpus using a query-free spectral objective.

## Why

ArrowSpace's retrieval quality depends on three graph-construction parameters:

| Parameter | What it controls |
|---|---|
| `eps` | Neighbourhood radius for graph edges |
| `k` | Number of nearest neighbours per node |
| `tau` | Search temperature (exploration vs. exploitation) |

Setting these by hand is tedious and corpus-dependent. `arrowspace_tuner` uses [Optuna](https://optuna.org/) and a label-free spectral MRR proxy to find them automatically in minutes.

## Install

```bash
# Core (no pandas/plotly)
pip install arrowspace-tuner

# With HTML/CSV reporting
pip install arrowspace-tuner[report]
```

## Quickstart

```python
import numpy as np
import arrowspace_tuner as arrowspace

embeddings = np.load("corpus.npy")   # shape (N, D) float64

# One-liner: auto-discover eps, k, tau — runs in ~15 min on 50k corpus
aspace, gl = arrowspace.optuna(embeddings)

# Search as normal
results = aspace.search(query_embedding, gl, tau=0.8)
```

## Power-user API

```python
from arrowspace_tuner import EpsTuner

tuner = EpsTuner(
    n_trials  = 15,
    sample_n  = 5_000,    # 33x faster: explore on 5k, final build on full corpus
    eps_low   = 0.8,      # narrow bounds if you know your corpus geometry
    eps_high  = 2.5,
    k_low     = 15,
    k_high    = 40,
    tau_low   = 0.05,
    tau_high  = 0.5,
    n_probe   = 50,
    storage   = "sqlite:///tune.db",   # resume interrupted runs
)

aspace, gl = tuner.fit(embeddings)

print(tuner.best_params)    # {"eps": 1.615, "k": 38, "tau": 0.114}
print(tuner.best_score)     # 2.138
print(tuner.best_fiedler)   # 0.718  — graph connectivity health
print(tuner.best_mrr_proxy) # 2.896  — retrieval coherence proxy

# Save CSV + HTML plots (requires [report] extra)
tuner.save_report(out_dir="results")
```

## Speed

The dominant cost is building the ArrowSpace graph on N vectors. With `sample_n`:

| Setting | Per trial | 15 trials | Notes |
|---|---|---|---|
| Full corpus (50k) | ~23 min | ~5.8h | baseline |
| `sample_n=5_000` | ~1.5 min | **~27 min** | **33x faster, same best params** |

The final build after the study always uses the full corpus.

## Objective

The objective is a weighted composite of three spectral signals — no ground-truth labels required:

```
score = 0.70 * mrr_top0_spectral   # retrieval coherence
      + 0.20 * log1p(fiedler)      # graph connectivity health
      + 0.10 * log1p(var_lambda)   # spectral richness
```

## Parallel runs

Optuna + SQLite lets you run multiple workers simultaneously:

```bash
# Terminal 1
python -m arrowspace_tuner --storage sqlite:///tune.db --trials 15

# Terminal 2 (simultaneously)
python -m arrowspace_tuner --storage sqlite:///tune.db --trials 15
```

## Requirements

- Python ≥ 3.12
- `arrowspace >= 0.26.0`
- `optuna >= 4.8.0`
- `scipy >= 1.17.1`
- `numpy >= 2.4.4`

## License

Apache-2.0 — see [LICENSE](LICENSE).
