Metadata-Version: 2.4
Name: twat-text
Version: 2.7.7
Project-URL: Documentation, https://github.com/twardoch/twat-text#readme
Project-URL: Issues, https://github.com/twardoch/twat-text/issues
Project-URL: Source, https://github.com/twardoch/twat-text
Author-email: Adam Twardoch <adam+github@twardoch.com>
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Requires-Python: >=3.10
Requires-Dist: twat>=1.8.1
Provides-Extra: all
Requires-Dist: twat>=1.8.1; extra == 'all'
Provides-Extra: build
Requires-Dist: build>=1.0.0; extra == 'build'
Requires-Dist: hatch-vcs>=0.4.0; extra == 'build'
Requires-Dist: hatchling>=1.27.0; extra == 'build'
Requires-Dist: pyinstaller>=6.0.0; extra == 'build'
Provides-Extra: dev
Requires-Dist: mypy>=1.15.0; extra == 'dev'
Requires-Dist: pre-commit>=4.1.0; extra == 'dev'
Requires-Dist: pyinstaller>=6.0.0; extra == 'dev'
Requires-Dist: ruff>=0.9.6; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.5.0; extra == 'docs'
Requires-Dist: mkdocs>=1.6.0; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.27.0; extra == 'docs'
Provides-Extra: test
Requires-Dist: pytest-benchmark[histogram]>=5.1.0; extra == 'test'
Requires-Dist: pytest-cov>=6.0.0; extra == 'test'
Requires-Dist: pytest-xdist>=3.6.1; extra == 'test'
Requires-Dist: pytest>=8.3.4; extra == 'test'
Description-Content-Type: text/markdown

# twat-text

`twat-text` is the user-facing text utility package in the twat ecosystem.

It keeps deterministic text algorithms local and delegates LLM-powered operations to `twat_llm` through a narrow adapter. Generated images, video, audio, and speech belong in `twat_genai` and media domain packages, not here.

## Install

```bash
pip install twat-text
```

## Deterministic API

```python
from twat_text import clean_text, chunk_text, extract_emails, convert_text, estimate_tokens

text = clean_text(" “Hello”\tworld ")
chunks = chunk_text(long_document, max_chars=2000, overlap=100)
emails = extract_emails("Ada <ada@example.com>")
plain = convert_text("<p>Hello &amp; welcome</p>", source="html", target="plain")
tokens = estimate_tokens(plain)
```

Available deterministic helpers include:

- `normalize_text` and `clean_text`
- `chunk_text` and `context_window_chunks`
- `extract_urls`, `extract_emails`, and `extract_numbers`
- `markdown_to_plain`, `html_to_plain`, `plain_to_html`, and `convert_text`
- `estimate_tokens`

## LLM-backed API

```python
from twat_text import summarize, rewrite, extract_structured, classify

summary = summarize(long_text)
rewrite = rewrite(draft, instruction="Make this friendlier")
data = extract_structured(note, schema_hint="Return JSON with date and total")
label = classify(message, labels=["support", "sales", "spam"])
```

These functions import `twat_llm` only when called, so deterministic utilities stay dependency-light and testable.

## CLI

```bash
twat-text clean "  Hello   world  "
twat-text chunk "long text..." --max-chars 500 --overlap 50
twat-text convert "<p>Hello</p>" --source html --target plain
twat-text summarize "long text..."
twat-text rewrite "draft text" --instruction "Make it concise"
```

Through the host plugin dispatcher, the same package is available as `twat text ...` once installed.

## Development

```bash
hatch run test
hatch run lint
hatch run type-check
```

The compatibility `Config` and `process_data` API remains available, but `process_data` now returns cleaned text, chunks, simple extracted values, and an estimated token count instead of an empty placeholder.
