Metadata-Version: 2.4
Name: wesh
Version: 3.1.1
Summary: Full text and vector search library with pluggable storage (file, SQL, Redis). Fork of Whoosh.
Keywords: search,fulltext,full-text,indexing,bm25,spell-check,stemming,vector,embedding,knn,semantic-search
Author-Email: Stefane Fermigier <sf@abilian.com>
License-Expression: BSD-2-Clause
License-File: LICENSE.txt
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Indexing
Classifier: Topic :: Text Processing :: Linguistic
Project-URL: Homepage, https://github.com/abilian/wesh
Project-URL: Documentation, https://github.com/abilian/wesh/tree/main/docs
Project-URL: Repository, https://github.com/abilian/wesh
Project-URL: Issues, https://github.com/abilian/wesh/issues
Project-URL: Changelog, https://github.com/abilian/wesh/blob/main/CHANGES.md
Requires-Python: >=3.12
Requires-Dist: orjson>=3.10
Requires-Dist: loguru>=0.7
Provides-Extra: sql
Requires-Dist: sqlalchemy>=2.0; extra == "sql"
Provides-Extra: redis
Requires-Dist: redis>=5.0; extra == "redis"
Provides-Extra: vector
Requires-Dist: numpy>=1.24; extra == "vector"
Provides-Extra: model2vec
Requires-Dist: wesh[vector]; extra == "model2vec"
Requires-Dist: model2vec>=0.8; extra == "model2vec"
Provides-Extra: fastembed
Requires-Dist: wesh[vector]; extra == "fastembed"
Requires-Dist: fastembed>=0.3; extra == "fastembed"
Requires-Dist: onnxruntime<1.26; extra == "fastembed"
Provides-Extra: hnsw
Requires-Dist: wesh[vector]; extra == "hnsw"
Requires-Dist: hnswlib>=0.7; extra == "hnsw"
Provides-Extra: openai
Requires-Dist: wesh[vector]; extra == "openai"
Requires-Dist: httpx>=0.27; extra == "openai"
Provides-Extra: all
Requires-Dist: wesh[sql]; extra == "all"
Requires-Dist: wesh[redis]; extra == "all"
Requires-Dist: wesh[model2vec]; extra == "all"
Requires-Dist: wesh[fastembed]; extra == "all"
Requires-Dist: wesh[hnsw]; extra == "all"
Requires-Dist: wesh[openai]; extra == "all"
Description-Content-Type: text/markdown

# Wesh?!

> **Wesh?!** is a fast, pure-Python full-text and vector search library
> with pluggable storage: filesystem, RAM, SQL, or Redis.

It's a 3.0 reboot of [Whoosh](https://github.com/mchaput/whoosh) by Matt
Chaput (via the [Sygil-Dev/whoosh-reloaded](https://github.com/Sygil-Dev/whoosh-reloaded)
fork). Both upstreams are unmaintained.

The original upstream README is preserved as [`README-whoosh.md`](README-whoosh.md).

## Install

```bash
pip install wesh
# with vector / semantic search (lightweight, works on Alpine too):
pip install "wesh[model2vec]"
# or with SQL backend:
pip install "wesh[sql]"
# or with Redis backend:
pip install "wesh[redis]"
```

Python 3.12+. Core deps are tiny (`orjson`, `loguru`). Backends and
embedders are opt-in extras: `[sql]`, `[redis]`, `[vector]`,
`[model2vec]`, `[fastembed]`, `[hnsw]`, `[openai]`, or `[all]` for the
lot.

## Quick start

```python
from wesh import index
from wesh.fields import Schema, TEXT, ID
from wesh.qparser import QueryParser

schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT)
ix = index.create_in("./indexdir", schema)

with ix.writer() as w:
    w.add_document(title="First", path="/a", content="hello world")
    w.add_document(title="Second", path="/b", content="goodbye world")

with ix.searcher() as s:
    q = QueryParser("content", ix.schema).parse("hello")
    for hit in s.search(q):
        print(hit["title"], hit.score)
```

Want to store the index elsewhere? Pass a URL instead of a path:

```python
ix = index.create_in("sqlite:///./wesh.db", schema)
ix = index.create_in("postgresql://user:pw@host/db", schema)
ix = index.create_in("redis://localhost:6379/0", schema)
```

Full documentation: build it locally with `uv run zensical build`
(see `site/`) or read the Markdown sources under [`docs/`](docs/).

## What's new in 3.0

- **Pluggable storage backends.** File, RAM, SQL (SQLAlchemy), Redis. Same
  Wesh API across all of them.
- **Vector search.** `VectorField` + `KnnQuery` + an `Embedder` ABC.
  Three embedders ship in-box:
  [`Model2VecEmbedder`](src/wesh/embedding/model2vec.py) (pure-numpy,
  runs everywhere — *recommended default*),
  [`FastembedEmbedder`](src/wesh/embedding/fastembed.py) (ONNX, more
  models, glibc-only), and
  [`OpenAIEmbedder`](src/wesh/embedding/openai.py) (HTTP, BYO endpoint).
- **Columnar numeric values** with an auto-routed fast path for numeric
  range queries (5.9× faster than the in-RAM W3 codec at 64K docs on the
  SQL hybrid codec).
- **Facets v1** — count-histogram facets with Typesense-style filter
  semantics.
- **Python 3.12 floor**; everything below the public API surface was
  rebuilt.
- **Modernised tooling**: `uv`, `ruff`, `mypy`, `pyrefly`, `ty`, `orjson`,
  `nox`, `pytest`.

Full release notes: [`CHANGES.md`](CHANGES.md) and [`docs/releases/3_0.md`](docs/releases/3_0.md).

## Performance, honestly

Pure Python is 10–50× slower than native engines (Lucene, tantivy, …). If
you need sub-millisecond queries over billions of documents, use one of
those. Wesh?! is sized for the 10K–10M document range with patient
indexing, where the win is "embedded in your Python app, no separate
service, no native build."

That said, the SQL hybrid codec ended up **faster than the in-RAM W3 codec
at 64K docs on most query shapes** — see the
[backends overview](docs/backends.md) and the design history under
[`notes/refactor-pluggable-backends.md`](notes/refactor-pluggable-backends.md).

## Development

```bash
git clone https://github.com/abilian/wesh
cd wesh
uv sync                          # install dev deps
uv run pytest tests/             # 962 tests + 2 vector-extra skips
uv run nox -s check              # ruff + mypy + ty
uv run zensical build            # build the docs (output: site/)
```

The cross-backend correctness check is the canonical perf-affecting
guardrail:

```bash
uv run python benchmark/equivalence.py
```

It builds the same corpus on every backend and asserts identical observable
outputs (matched docs, ranked order, term info, doc count). Exits 1 on
mismatch.

## Lineage

Whoosh was created by [Matt Chaput](mailto:matt@whoosh.ca) at Side Effects
Software for the Houdini documentation site, then open-sourced. Upstream
Whoosh stopped around 2012 (1.x line) and 2014 (2.x line). The
[Sygil-Dev/whoosh-reloaded](https://github.com/Sygil-Dev/whoosh-reloaded)
fork kept the 2.x line on Python 3 but is itself unmaintained.

Wesh?! picks up from the Sygil-Dev branch, drops the legacy compatibility
layers, rebuilds the storage layer around an abstract `Storage` ABC, and
adds vector search. The Python import name was renamed from `whoosh` to
`wesh` for 3.0.

## License

BSD 2-clause (inherited from upstream Whoosh). See [`LICENSE.txt`](LICENSE.txt).
