Metadata-Version: 2.2
Name: nyansasua
Version: 0.1.0
Summary: Multi-language NLP keyword extraction (C++17 core, pybind11 bindings).
Author: Cire contributors
License: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: C++
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Text Processing :: Indexing
Classifier: Topic :: Text Processing :: Linguistic
Project-URL: Homepage, https://github.com/yourorg/cire
Project-URL: Issues, https://github.com/yourorg/cire/issues
Requires-Python: >=3.8
Description-Content-Type: text/markdown

# Nyansasua: Blazing-fast Multi-Language Keyword Extraction

> *Nyansasua* (Twi) — **learning / wisdom**.

A self-contained, high-performance Python library for **multi-language keyword extraction**, backed by a low-level C++17 core.

## Features

- **14 Languages Supported**: English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, Arabic, Indonesian, and **Twi**.
- **4 Extraction Algorithms**: TF-IDF, YAKE, TextRank, and RAKE. Also supports a combined **Ensemble** mode.
- **Lightning Fast**: Highly optimized C++ backend. Benchmarks show processing of **1.5+ million characters (~200,000 words) in < 0.5 seconds**.
- **UTF-8 Native**: Natively handles complex Unicode and mixed-script texts without breaking token boundaries.

## Installation

```bash
pip install nyansasua
```

## Quick Start

```python
import cire

# One-liner extraction
for k in cire.extract_keywords("Machine learning is a branch of AI.", top_k=5):
    print(k.text, k.score)

# Using the Extractor class
ext = cire.Extractor(language="auto", algorithm="ensemble", top_k=10)
for k in ext.extract("Natural language processing has seen rapid growth."):
    print(k.text, k.score)
```

## Batch Processing & TF-IDF Extraction

```python
import cire

ext = cire.Extractor(language="auto")

# Extract from a batch of documents
results = ext.extract_many([
    "Python is widely used in data science.",
    "Climate change is a significant global challenge."
])

# Corpus-driven TF-IDF extraction
corpus = [
    "Python is used in data science.", 
    "Java is used in enterprise environments.",
    "Python is popular for AI."
]
kws = ext.extract_corpus_tfidf(
    texts=corpus, 
    target_text="Python is heavily utilized in AI and ML.", 
    top_k=3
)
```

## Language Detection & Stopwords

```python
import cire

# Detect the dominant script heuristically
lang = cire.detect_language("Bonjour tout le monde") 
print(lang)  # Language.French

# Add custom stopwords dynamically
cire.add_stopword("french", "tout")
```

## License

MIT License.
