Metadata-Version: 2.4
Name: lucisearch
Version: 0.1.1
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Database :: Database Engines/Servers
Classifier: Topic :: Text Processing :: Indexing
Summary: An embeddable, in-process search engine written in Rust
Keywords: search,elasticsearch,bm25,vector,embedded,full-text,lucene
Author: Tshimanga
License-Expression: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM

# lucisearch

An embeddable, in-process search engine written in Rust with Python bindings.

## Features

- **Full-text search** with BM25 scoring, phrase queries, fuzzy matching
- **Structured queries** — term, range, bool, geo_distance, geo_shape, nested
- **Aggregations** — terms, stats, histogram, date_histogram, cardinality, percentiles, and 12 more
- **Sort by field** with keyword, numeric, boolean columns
- **Pagination** — `from`/`size` and cursor-based `search_after`
- **Document CRUD** — get, delete, update by `_id`, delete by query
- **Hybrid search** — dense vector kNN + text via RRF fusion
- **2-8x faster than Elasticsearch** on every supported feature
- **No server, no cluster, no HTTP** — just `pip install` and search

## Quick Start

```python
import luci

# Create an index
index = luci.Index.create("my_index.luci", {
    "properties": {
        "title": {"type": "text"},
        "body": {"type": "text"},
        "tag": {"type": "keyword"},
        "price": {"type": "float"},
    }
})

# Add documents
index.add({"title": "Hello World", "body": "Getting started", "tag": "intro", "price": 9.99})
index.add({"title": "Search Engine", "body": "Full-text search", "tag": "tech", "price": 29.99})
index.commit()

# Search (ES-compatible query DSL)
results = index.search({
    "query": {"match": {"title": "hello"}},
    "size": 10
})

for hit in results["hits"]:
    print(hit["_score"], hit["_source"]["title"])

# Sort by field
results = index.search({
    "query": {"match_all": {}},
    "sort": [{"price": "asc"}],
    "size": 5
})

# Aggregations
results = index.search({
    "query": {"match_all": {}},
    "aggs": {"by_tag": {"terms": {"field": "tag"}}},
    "size": 0
})

# CRUD operations
doc = index.get("my_doc_id")
index.update("my_doc_id", {"price": 19.99})
index.delete("my_doc_id")
index.delete_by_query({"term": {"tag": "draft"}})
```

## Performance

Benchmarked on 100,000 documents (Apple M5 Max):

### vs Elasticsearch

| Query | Luci | ES | Speedup |
|-------|------|-----|---------|
| sort_numeric | 1074μs | 2149μs | 2.0x |
| fields_only | 281μs | 1225μs | 4.4x |
| explain | 404μs | 1368μs | 3.4x |
| from_0 | 369μs | 1046μs | 2.8x |
| rescore | 362μs | 1994μs | 5.5x |
| source_disabled | 275μs | 897μs | 3.3x |
| collapse | 3332μs | 5308μs | 1.6x |

### vs Tantivy

| Query | Luci | Tantivy | Speedup |
|-------|------|---------|---------|
| match_phrase | 190μs | 409μs | 2.1x |
| match_all | 272μs | 488μs | 1.8x |
| from_1000 | 273μs | 519μs | 1.9x |
| no_source | 68μs | 140μs | 2.0x |
| bool_must | 280μs | 362μs | 1.3x |

## License

MIT

