Metadata-Version: 2.4
Name: anynlp
Version: 0.2.4
Summary: One-liner NLP utilities -- summarize, classify, extract entities, analyze sentiment -- with rule-based fallbacks and HuggingFace backends.
Project-URL: Homepage, https://github.com/vietanhdev/anynlp
Project-URL: Documentation, https://github.com/vietanhdev/anynlp#readme
Project-URL: Repository, https://github.com/vietanhdev/anynlp
Project-URL: Issues, https://github.com/vietanhdev/anynlp/issues
Project-URL: Author, https://www.nrl.ai
Author-email: Viet-Anh Nguyen <vietanh.dev@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: classification,ner,nlp,sentiment,summarization,text
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.8
Requires-Dist: click>=8.0
Provides-Extra: all
Requires-Dist: anyllm>=0.1.0; extra == 'all'
Requires-Dist: gliner>=0.2.0; extra == 'all'
Requires-Dist: sentence-transformers>=2.5.0; extra == 'all'
Requires-Dist: torch>=1.9.0; extra == 'all'
Requires-Dist: transformers>=4.20.0; extra == 'all'
Provides-Extra: embeddings
Requires-Dist: sentence-transformers>=2.5.0; extra == 'embeddings'
Provides-Extra: gliner
Requires-Dist: gliner>=0.2.0; extra == 'gliner'
Provides-Extra: llm
Requires-Dist: anyllm>=0.1.0; extra == 'llm'
Provides-Extra: progress
Requires-Dist: tqdm>=4.60.0; extra == 'progress'
Provides-Extra: transformers
Requires-Dist: torch>=1.9.0; extra == 'transformers'
Requires-Dist: transformers>=4.20.0; extra == 'transformers'
Description-Content-Type: text/markdown

<h1 align="center">anynlp</h1>
<p align="center"><em>NLP utilities that work out of the box — rule-based by default, HuggingFace/LLM-boosted on demand.</em></p>

<p align="center">
<img src="https://img.shields.io/pypi/v/anynlp.svg" alt="PyPI">
<img src="https://img.shields.io/pypi/pyversions/anynlp.svg" alt="Python">
<img src="https://img.shields.io/pypi/l/anynlp.svg" alt="License">
</p>

**anynlp** is a graceful-degradation NLP toolkit. Every task — summarization, classification, NER, sentiment, keyword extraction, similarity, chunking, clustering — has a pure-Python rule-based implementation that runs with zero dependencies. Install the optional `transformers` extra for pretrained HuggingFace models, or the `llm` extra to route any task through `anyllm` for state-of-the-art quality. Perfect for RAG pipelines, content analysis, and lightweight text processing.

Built by [Viet-Anh Nguyen](https://github.com/vietanhdev) at [NRL.ai](https://www.nrl.ai).

## Why anynlp?

- **One-liner API** — `anynlp.sentiment("I love this")` just works, no downloads
- **Plugin architecture** — Every task has `rules` / `hf` / `llm` backends, pick per-call
- **Local-first** — Rule-based backends run entirely in pure Python, zero deps
- **Minimal core deps** — None! Heavy backends are all optional extras
- **Production-ready** — Dataclass results, streaming, batch processing

## Installation

```bash
pip install anynlp
```

For stronger backends:

```bash
pip install anynlp[transformers]    # HuggingFace pretrained models
pip install anynlp[llm]             # route through anyllm (Ollama, OpenAI, ...)
pip install anynlp[sklearn]         # KMeans clustering + TF-IDF speedups
pip install anynlp[all]             # everything
```

**Python 3.8+ supported** (tested on 3.8, 3.9, 3.10, 3.11, 3.12, 3.13)

## Quick Start

```python
import anynlp

# 1. Extractive summarization (pure Python, zero deps)
summary = anynlp.summarize("Long article text...", max_sentences=3)

# 2. Sentiment analysis (AFINN lexicon with negation handling)
s = anynlp.sentiment("I absolutely do not love this product")
print(s.label, s.score)             # "negative" -0.4

# 3. Named-entity recognition (regex patterns by default)
ents = anynlp.ner("Apple was founded by Steve Jobs in Cupertino in 1976.")
for e in ents:
    print(e.text, e.label)          # "Apple" ORG, "Steve Jobs" PERSON, ...

# 4. Zero-shot classification (keyword overlap by default)
result = anynlp.classify(
    "The GPU crashed during training",
    labels=["hardware", "software", "billing"],
)
print(result.label)                 # "hardware"

# 5. RAG-style chunking
chunks = anynlp.chunk(long_text, method="recursive", size=500, overlap=50)
```

## Models & Methods

**anynlp** has 3 backend tiers — pick based on your needs:

### 1. Rule-based backend (default, zero dependencies)

- **Summarization**: Extractive — sentences scored by position, keyword frequency, and length; top-N selected to meet target ratio
- **Classification**: Keyword overlap with class label tokens
- **NER**: Regex patterns for emails, URLs, phone numbers, dates, money, percent
- **Sentiment**: AFINN-style word lexicon with negation handling and intensifiers
- **Keywords**: TF-IDF-like scoring relative to a built-in stopword list
- **Similarity**: Jaccard similarity on word sets
- **Chunking**: Recursive (paragraph→sentence→word), fixed-size, or sentence-boundary methods
- **Clustering**: TF-IDF + KMeans

### 2. HuggingFace backend (optional via `[transformers]`)

- Uses **transformers.pipeline()** for SOTA model-based tasks
- Models auto-download from HuggingFace Hub
- Supports: summarization (BART), classification (zero-shot), NER (BERT), sentiment

### 3. LLM backend (optional via `[llm]`, uses anyllm)

- Sends task to any LLM (Ollama, OpenAI, Anthropic) for highest quality
- Structured outputs via JSON mode for entity/keyword extraction
- Best for: complex documents, multi-language, custom schemas

## API Reference

| Function | Purpose |
|---|---|
| `anynlp.summarize(text, max_sentences=3, backend="rules")` | Summary text |
| `anynlp.classify(text, labels, backend="rules")` | `ClassificationResult` |
| `anynlp.ner(text, backend="rules")` | `list[Entity]` |
| `anynlp.sentiment(text, backend="rules")` | `SentimentResult` |
| `anynlp.keywords(text, top_k=10)` | `list[(term, score)]` |
| `anynlp.similarity(a, b)` | Float 0.0 - 1.0 |
| `anynlp.chunk(text, method="recursive", size=500)` | `list[str]` |
| `anynlp.cluster(texts, k=5)` | `list[int]` cluster labels |

## CLI Usage

```bash
anynlp summarize article.txt --sentences 5
anynlp sentiment "I loved the book"
anynlp ner "Tim Cook visited Paris on 2024-05-01"
anynlp classify "GPU crash" --labels "hardware,software,billing"
anynlp chunk document.txt --method recursive --size 500
```

## Examples

### RAG chunking + clustering

```python
import anynlp

# Split documents for a RAG index
with open("book.txt") as f:
    chunks = anynlp.chunk(f.read(), method="recursive", size=500, overlap=50)

# Cluster the chunks to find topic groups (TF-IDF + KMeans)
labels = anynlp.cluster(chunks, k=8)
```

### Upgrade to HuggingFace when quality matters

```python
import anynlp

# Rule-based: fast, no downloads
s1 = anynlp.sentiment("complicated sentence with irony", backend="rules")

# HuggingFace: slower first call, better accuracy
s2 = anynlp.sentiment("complicated sentence with irony", backend="hf")

# LLM: highest quality, requires anyllm + a provider
s3 = anynlp.sentiment("complicated sentence with irony",
                      backend="llm", model="llama3.1:8b")
```

### Zero-shot text classification with a local LLM

```python
import anynlp

result = anynlp.classify(
    "My order is 3 weeks late",
    labels=["billing", "shipping", "product", "other"],
    backend="llm",
    model="llama3.1:8b",
)
print(result.label, result.score)
```

## License

MIT (c) Viet-Anh Nguyen
