Metadata-Version: 2.4
Name: small-hybrid-reranker
Version: 0.2.0
Summary: Lightweight hybrid reranker with baked-in model artifact.
Author: cnmoro
Keywords: nlp,ranking,reranker,search
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: joblib>=1.3
Requires-Dist: lightgbm>=4.5.0
Requires-Dist: model2vec>=0.7.0
Requires-Dist: numpy>=1.24
Requires-Dist: rank-bm25>=0.2.2
Description-Content-Type: text/markdown

# small-hybrid-reranker

`small-hybrid-reranker` is a lightweight reranker package with a baked-in trained model.

It reranks a list of passages for a query using a hybrid feature stack:
- static embeddings (`cnmoro/static-nomic-384-pten`)
- lexical overlap and token interaction sketches
- BM25 and dense retrieval priors
- listwise LightGBM ranker

The model artifact is included in the package, so there is no separate checkpoint download.

## Model In This Release

- Version `0.2.0` packages an updated model trained on all available SciFact splits in this repository (`train + test`) for maximum fit.
- Training setup used strict BM25 top-100 candidates with LightGBM LambdaRank over hybrid features.
- In-sample all-sets metric from training run:
  - `ndcg@10`: `0.89999`
  - `recall@10`: `0.89830`

Inference remains lightweight and CPU-friendly: the API is still a single `HybridReranker().rerank(query, passages)` call.

## Install

```bash
pip install small-hybrid-reranker
```

## Quickstart

```python
from small_hybrid_reranker import HybridReranker

reranker = HybridReranker()

query = "What is the speed of light?"
passages = [
    "The speed of light in a vacuum is about 299,792 km/s.",
    "Earth orbits the Sun in about 365 days.",
    "Newton described laws of motion.",
]

ranked = reranker.rerank(query, passages)
print(ranked[0])
# {'passage': 'The speed of light in a vacuum is about 299,792 km/s.', 'score': 100.0}
```

## API

### `HybridReranker(model_path: str | None = None)`

- `model_path=None`: uses baked-in model inside package.
- `model_path="...joblib"`: load your own compatible artifact.

### `rerank(query: str, passages: list[str], top_k: int | None = None) -> list[dict]`

Returns:

```python
[
  {"passage": "...", "score": 82.31},
  {"passage": "...", "score": 40.87},
]
```

Scores are floats in `[0, 100]` and sorted descending.

## Notes

- This package is optimized for reranking a provided candidate list.
- It is not a full retrieval system by itself.
