Metadata-Version: 2.4
Name: asia-fertility
Version: 0.2.0
Summary: Tokenizer fertility, cost, and multi-turn context-budget analyzer for low-resource Asian languages.
Project-URL: Homepage, https://fertiscope.vercel.app
Project-URL: Repository, https://github.com/Helmo21/asia-fertility
Project-URL: Documentation, https://helmo21.github.io/asia-fertility/
Author: Antoine Pedretti
License: MIT
License-File: LICENSE
Keywords: asian-languages,fertility,llm,low-resource,multilingual,tokenizer
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Requires-Dist: numpy>=1.26
Requires-Dist: pydantic-settings>=2.3
Requires-Dist: pydantic>=2.7
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.7
Requires-Dist: typer>=0.12
Provides-Extra: api
Requires-Dist: anthropic>=0.40; extra == 'api'
Requires-Dist: google-genai>=0.3; extra == 'api'
Requires-Dist: httpx>=0.27; extra == 'api'
Provides-Extra: dev
Requires-Dist: hypothesis>=6.112; extra == 'dev'
Requires-Dist: mypy>=1.11; extra == 'dev'
Requires-Dist: pre-commit>=3.8; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=5; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: respx>=0.21; extra == 'dev'
Requires-Dist: ruff>=0.6; extra == 'dev'
Requires-Dist: types-pyyaml; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.5; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.26; extra == 'docs'
Requires-Dist: pymdown-extensions>=10; extra == 'docs'
Provides-Extra: hf
Requires-Dist: datasets>=2.21; extra == 'hf'
Requires-Dist: huggingface-hub>=0.25; extra == 'hf'
Requires-Dist: khmer-nltk>=1.6; extra == 'hf'
Requires-Dist: laonlp>=1.3; extra == 'hf'
Requires-Dist: pythainlp>=5.0; extra == 'hf'
Requires-Dist: sentencepiece>=0.2; extra == 'hf'
Requires-Dist: tokenizers>=0.20; extra == 'hf'
Requires-Dist: transformers>=4.44; extra == 'hf'
Provides-Extra: niah
Requires-Dist: httpx>=0.27; extra == 'niah'
Requires-Dist: tenacity>=8.5; extra == 'niah'
Provides-Extra: oai
Requires-Dist: tiktoken>=0.8; extra == 'oai'
Provides-Extra: viz
Requires-Dist: matplotlib>=3.9; extra == 'viz'
Requires-Dist: pandas>=2.2; extra == 'viz'
Requires-Dist: pyarrow>=17; extra == 'viz'
Description-Content-Type: text/markdown

# asia-fertility 🌏

**The hidden multilingual tax in your tokenizer — measured before you deploy.**

[![CI](https://github.com/Helmo21/asia-fertility/actions/workflows/ci.yml/badge.svg)](https://github.com/Helmo21/asia-fertility/actions/workflows/ci.yml)
[![License](https://img.shields.io/badge/license-MIT-blue)](LICENSE)
[![Python](https://img.shields.io/badge/python-3.11%2B-blue)](pyproject.toml)

> **Status: v0.2 under active construction.** See [`ROADMAP.md`](ROADMAP.md) and the per-phase specs in [`tasks/`](tasks/).

`asia-fertility` measures the structural cost penalty that LLM tokenizers impose on lower-resource Asian languages. The same content can cost up to 11× more tokens in Burmese than in English on a frontier tokenizer — silent inflation of API bills, smaller usable context windows, and fewer in-context examples.

## Quickstart (once v0.3 ships)

```bash
pip install "asia-fertility[oai]"
asia-fertility reproduce
```

## What's currently usable

- v0.1 Python prototype: `legacy_v01/fertiscope/` (EN↔VI only, CLI).
- Live Next.js web demo: [fertiscope.vercel.app](https://fertiscope.vercel.app).
- 41 implementation specs: [`tasks/`](tasks/).

## License

MIT © 2026 Antoine Pedretti. Bundled FLORES-200 data: CC-BY-SA 4.0 (Meta NLLB).
