Metadata-Version: 2.4
Name: skimmatch
Version: 0.2.1
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Rust
Summary: In-process fzf/skim-style fuzzy finder for Python, implemented in Rust.
Author: TELOS
License: MIT
Requires-Python: >=3.13
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/mynl/skimmatch
Project-URL: Issues, https://github.com/mynl/skimmatch/issues
Project-URL: Repository, https://github.com/mynl/skimmatch

# skimmatch

`skimmatch` is an in-process fzf/skim-style fuzzy finder for Python,
implemented in Rust.

It is designed for ranked abbreviation matching over a fixed list of candidate
strings. You give it strings such as filenames, references, titles, symbols, or
command labels; users type short abbreviation-style queries; `skimmatch`
returns the best candidates, scores, and optional highlight positions.

```python
from skimmatch import Matcher

candidates = [
    "Follmer and Schied, Stochastic Finance, 2011",
    "Mildenhall and Major, Pricing Insurance Risk",
    "Wang distortion risk measures",
    "Archive reference catalogue",
]

matcher = Matcher(candidates)

for result in matcher.search("wang distortion", limit=3):
    print(result)
```

Example result:

```python
{
    "index": 2,
    "score": 260,
    "text": "Wang distortion risk measures",
    "matches": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10],
}
```

Scores are backend scores where higher is better. The exact numeric value should
be treated as ranking information, not as a stable cross-version metric.

## What This Is

`skimmatch` solves the same broad problem as interactive fuzzy finders such as
`fzf` and `skim`: finding good abbreviation matches quickly.

For example, a query like:

```text
fs sf 2011
```

can match:

```text
Follmer and Schied, Stochastic Finance, 2011
```

because the query characters and tokens appear in useful positions and in the
right order.

This is different from edit-distance fuzzy matching. Libraries such as
RapidFuzz, Levenshtein, or token-ratio matchers are excellent for typo
correction, deduplication, OCR cleanup, and record linkage. `skimmatch` is aimed
at fast candidate selection, interactive search, and highlightable abbreviation
matching.

## Features

- In-process Python extension: no external `fzf` executable required.
- Rust matching backends using `SkimMatcherV2`, `nucleo-matcher`, and
  `frizbee`.
- Preloaded candidate lists for fast repeated queries.
- Single-token and multi-token search modes.
- Optional highlight indices for UI rendering.
- Legacy tuple-returning APIs for compatibility with the earlier `rustfuzz`
  shape.
- Structured `Matcher.search(...)` API for new code.
- Backend argument already present, so future backends can be added without
  changing the public matcher classes.

## Installation

When published on PyPI:

```bash
pip install skimmatch
```

From a local checkout:

```bash
uv pip install -e .
```

or build with maturin:

```bash
uv run maturin develop
```

The current package metadata targets Python 3.13 or newer.

## Quick Start

Use `Matcher` for new code.

```python
from skimmatch import Matcher

candidates = [
    "Buhlmann, Mathematical Methods in Risk Theory",
    "Cramer, Collective Risk Theory",
    "Mildenhall and Major, Pricing Insurance Risk",
    "Kaas, Goovaerts, Dhaene, and Denuit, Modern Actuarial Risk Theory",
]

matcher = Matcher(candidates)
results = matcher.search("risk theory", limit=5)

for result in results:
    print(result["index"], result["score"], result["text"])
```

By default, `search`:

- splits the query on whitespace;
- requires every query token to match;
- returns up to 20 results;
- includes candidate text;
- includes highlight positions.

## Structured API

```python
matcher = Matcher(candidates, backend="nucleo")  # or "skim" or "frizbee"
results = matcher.search(
    query,
    limit=20,
    highlights=True,
    include_text=True,
    multi=True,
)
```

Each result is a dictionary containing:

```python
{
    "index": 0,          # original candidate index
    "score": 123,       # backend score, higher is better
    "text": "...",      # included when include_text=True
    "matches": [0, 3],  # included when highlights=True
}
```

### Parameters

`query`

The search string. In multi-token mode, whitespace-separated tokens are matched
independently and every token must match the candidate.

`limit`

The maximum number of results to return. `limit=0` returns an empty list.

`highlights`

When true, results include `matches`, a sorted and deduplicated list of matched
positions. Turn this off when you only need ranking; score-only matching does
less work.

`include_text`

When true, each result includes the original candidate string. Turn this off if
you already have the candidate list and want smaller result objects.

`multi`

When true, the query is split on whitespace and all tokens are required. When
false, the whole query is sent to the matcher as one pattern.

## Legacy APIs

The package also exports compatibility classes with tuple return shapes:

```python
from skimmatch import FuzzyMatcher, FuzzyMatcherMulti, FuzzyMatcherMultiHi
```

### `FuzzyMatcher`

Treats the whole query as one pattern.

```python
matcher = FuzzyMatcher(candidates)
indices, scores = matcher.query("sf", top_k=10)
```

### `FuzzyMatcherMulti`

Splits the query on whitespace. Every token must match.

```python
matcher = FuzzyMatcherMulti(candidates)
indices, scores = matcher.query("pricing insurance", top_k=10)
```

### `FuzzyMatcherMultiHi`

Like `FuzzyMatcherMulti`, but also returns highlight positions.

```python
matcher = FuzzyMatcherMultiHi(candidates)
indices, scores, highlights = matcher.query("pricing insurance", top_k=10)
```

## Matching Behavior

The available backends are:

```python
backend="skim"
backend="nucleo"
backend="frizbee"
```

`backend="skim"` uses `SkimMatcherV2` from the Rust `fuzzy-matcher` crate and
is kept for compatibility.

`backend="nucleo"` uses `nucleo-matcher`, the lower-level matcher from the
nucleo ecosystem. It is the default backend. It is a modern fzf-like backend
and may rank candidates differently from `skim`. Scores are backend-specific
and should not be compared between backends.

`backend="frizbee"` uses `frizbee`, a SIMD matcher with typo-resistant matching
support. `skimmatch` currently runs it with typo tolerance disabled for a closer
comparison with the other fzf-style backends. It matches against bytes, so
highlight lists are intentionally empty for this backend until Unicode offset
semantics are defined.

Good matches tend to reward:

- characters appearing in order;
- compact alignments;
- word-boundary matches;
- punctuation-separated and camel-case transitions;
- early matches;
- consecutive query-character matches;
- candidates that match every query token in multi-token mode.

`skimmatch` returns candidates sorted by descending score. Ties are ordered by
the original candidate index for deterministic output.

## When To Use It

`skimmatch` is a good fit for:

- command palettes;
- file pickers;
- bibliography and reference search;
- symbol search;
- autocomplete over known labels;
- terminal or web UI candidate selection;
- fast repeated queries over a preloaded list.

It is probably not the right tool for:

- typo correction;
- deduplication;
- record linkage;
- token-sort similarity;
- OCR cleanup;
- semantic search;
- embedding-based retrieval.

Those are useful problems, but they are different from fzf/skim-style
abbreviation matching.

## Performance Notes

Candidate strings are copied into Rust once when the matcher is constructed.
Repeated calls to `query` or `search` scan that Rust-owned list and return only
the final top results to Python.

For best performance:

- construct one matcher and reuse it across queries;
- set `highlights=False` when you only need indices and scores;
- set `include_text=False` when you already have the candidate strings;
- use `limit` to keep returned result objects small.

## Development

This project is a Python package with a Rust extension built by maturin.

Run the tests:

```bash
uv run pytest tests/test_skimmatch.py -q
```

Check Rust formatting:

```bash
cargo fmt --check
```

Important files:

- `src/lib.rs`: Rust/PyO3 extension implementation.
- `python/skimmatch/__init__.py`: Python re-exports.
- `tests/test_skimmatch.py`: API and behavior tests.
- `pyproject.toml`: Python packaging and maturin configuration.
- `Cargo.toml`: Rust crate configuration.

## Backend Roadmap

The public API accepts a `backend` argument. Today `"skim"`, `"nucleo"`, and
`"frizbee"` are implemented. `frizbee` is experimental and currently exposes
score/ranking behavior without highlight positions.

Unknown backend names currently raise `ValueError`.

## License

MIT.

