Metadata-Version: 2.4
Name: stemlite
Version: 0.1.0
Summary: A tiny, zero-dependency collection of English stemmers (Porter, Snowball, Lancaster).
Project-URL: Homepage, https://github.com/alexeygrigorev/stemlite
Project-URL: Repository, https://github.com/alexeygrigorev/stemlite
Author-email: Alexey Grigorev <alexey.s.grigoriev@gmail.com>
License: WTFPL
Keywords: lancaster,nlp,porter,search,snowball,stemmer,stemming,zero-dependency
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Text Processing :: Indexing
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.9
Description-Content-Type: text/markdown

# stemlite

A tiny, **zero-dependency** collection of English stemmers — standard library
only, no `nltk`/`numpy`/anything. Pure-Python implementations of the classic
**Porter**, **Snowball** (Porter2), and **Lancaster** stemming algorithms,
plus a small registry so you can pick one by name.

It exists to be a single, self-contained, stable dependency shared across
[`minsearch`](https://github.com/alexeygrigorev/minsearch),
[`zerosearch`](https://github.com/alexeygrigorev/zerosearch), and
`sqlitesearch` — so those search libraries can normalize words the same way
without each carrying its own copy or pulling in a heavyweight NLP stack.
Designed to run anywhere Python runs, including constrained environments like
Cloudflare Python Workers (Pyodide).

> Note: these are pragmatic, simplified implementations tuned for search-time
> normalization, not bit-for-bit reference implementations of the published
> algorithms.

## Install

```bash
pip install stemlite
```

## Usage

```python
from stemlite import get_stemmer, porter_stemmer, snowball_stemmer, lancaster_stemmer, STEMMERS

porter_stemmer("running")      # -> "run"
snowball_stemmer("running")    # -> "run"
lancaster_stemmer("running")   # -> "run"

# Pick a stemmer by name (case-insensitive).
stem = get_stemmer("porter")
stem("running")                # -> "run"

# None (or an unknown name) returns a no-op stemmer that only lowercases.
noop = get_stemmer(None)
noop("Running")                # -> "running"
```

### `get_stemmer(name)`

```python
def get_stemmer(name: Optional[str] = None) -> Callable[[str], str]: ...
```

Accepts `"porter"`, `"snowball"`, `"lancaster"`, `"none"`, or `None`, and
returns a `Callable[[str], str]`. An unknown name (or `None`) falls back to the
no-op stemmer, which just lowercases the input. Names are matched
case-insensitively.

### `STEMMERS`

The registry backing `get_stemmer`, a `Dict[str, Callable[[str], str]]` keyed by
`"porter"`, `"snowball"`, `"lancaster"`, and `"none"`.

## Choosing a stemmer

* **Porter** — the classic, conservative choice. Good default.
* **Snowball** (Porter2) — a refinement of Porter; slightly different handling
  of edge cases.
* **Lancaster** — the most aggressive; stems words down harder (higher recall,
  more collisions).

## Development

```bash
make setup     # uv sync --dev
make test      # run the test suite
make coverage  # run tests with coverage
```

## License

WTFPL.
