Metadata-Version: 2.4
Name: kaalin
Version: 3.3.2.post1
Summary: Karakalpak language toolkit for Python — Latin/Cyrillic script conversion, number-to-words, and string utilities
Author-email: Turdibek Jumabaev <turdibekjumabaev05@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/dontbeidle/kaalin-python
Project-URL: Repository, https://github.com/dontbeidle/kaalin-python
Project-URL: Issues, https://github.com/dontbeidle/kaalin-python/issues
Project-URL: PyPI, https://pypi.org/project/kaalin/
Keywords: karakalpak,karakalpakstan,latin,cyrillic,transliteration,script-conversion,num2words,number-to-words,nlp,language-tools,turkic
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Operating System :: OS Independent
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# Kaalin

[![PyPI version](https://img.shields.io/pypi/v/kaalin)](https://pypi.org/project/kaalin/)
[![Python](https://img.shields.io/pypi/pyversions/kaalin)](https://pypi.org/project/kaalin/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)

A Python toolkit for the **Karakalpak language**: Latin-Cyrillic script conversion, number-to-words, and locale-aware string operations. Zero dependencies.

## Quick Start

```bash
pip install kaalin
```

```python
from kaalin.converter import latin2cyrillic, cyrillic2latin

print(latin2cyrillic("Assalawma áleykum"))  # Ассалаўма әлейкум
print(cyrillic2latin("Ассалаўма әлейкум"))  # Assalawma áleykum
```

## Supported Features

| Feature | Description |
|---|---|
| **Script Conversion** | Bidirectional Latin ↔ Cyrillic conversion with multi-character mapping (`sh`→`ш`, `ch`→`ч`) and special Cyrillic rules (`ьи`→`yi`, `ьо`→`yo`, `ъе`→`ye`) |
| **Number to Words** | Converts integers and floats to Karakalpak words in Latin or Cyrillic script. Supports range 0 to 10³⁰, negative numbers, and decimal fractions |
| **Word Syllabification** | Splits Karakalpak words into syllables, works with both Latin and Cyrillic scripts, preserves letter case, and recognises digraphs like `sh`, `ch`, `yu`, `ya`, `aw`, `ew` |
| **String Utilities** | Karakalpak-aware `upper()` / `lower()` that correctly handle the dotless `ı` ↔ `Í` character pair |
| **CLI Tools** | `cyr2lat` and `lat2cyr` commands for converting text files from the terminal |

## API Reference

### Script Conversion

```python
from kaalin.converter import latin2cyrillic, cyrillic2latin

latin2cyrillic("Qaraqalpaqstan")    # Қарақалпақстан
cyrillic2latin("Қарақалпақстан")    # Qaraqalpaqstan
```

Both functions accept a `str` and return a `str`. The converter handles uppercase, lowercase, and mixed-case text.

### Number to Words

```python
from kaalin.number import to_word, NumberRangeError

to_word(123)                     # bir júz jigirma úsh
to_word(999, num_type="cyr")     # тоғыз жүз тоқсан тоғыз
to_word(12.75)                   # on eki pútin júzden jetpis bes
to_word(-42)                     # minus qırıq eki
```

**Parameters:**
- `number` (`int | float`) — the number to convert
- `num_type` (`str`) — output script: `"lat"` (default) or `"cyr"`

**Raises:** `NumberRangeError` if `number` exceeds 10³⁰.

### Word Syllabification

```python
from kaalin.syllable import syllabify

syllabify("qaraqalpaqstan")   # ['qa', 'ra', 'qal', 'paq', 'stan']
syllabify("kompyuter")        # ['kom', 'pyu', 'ter']
syllabify("Шарапат")          # ['Ша', 'ра', 'пат']
syllabify("Adam")             # ['A', 'dam']

"-".join(syllabify("úydegiler"))   # 'úy-de-gi-ler'
```

**Parameters:**
- `word` (`str`) — the word to split. Accepts Latin or Cyrillic input.

**Returns:** A `list[str]` of syllables in the same script as the input. Words with fewer than two vowels are returned as a single-element list unchanged.

**Raises:** `TypeError` if `word` is not a string.

### String Utilities

```python
from kaalin.string import upper, lower

upper("Assalawma áleykum")   # ASSALAWMA ÁLEYKUM
lower("ASSALAWMA ÁLEYKUM")   # assalawma áleykum
```

Python's built-in `str.upper()` / `str.lower()` does not handle the Karakalpak dotless `ı` correctly. These functions fix that.

## CLI Usage

Convert text files between scripts directly from the terminal:

```bash
# Cyrillic → Latin
cyr2lat input.txt              # writes input-lat.txt
cyr2lat input.txt output.txt   # writes output.txt

# Latin → Cyrillic
lat2cyr input.txt              # writes input-cyr.txt
lat2cyr input.txt output.txt   # writes output.txt
```

## When to Use Kaalin

- Converting Karakalpak text between Latin and Cyrillic scripts
- Displaying numbers as Karakalpak words (invoices, checks, education)
- Splitting words into syllables for hyphenation, typesetting, or language learning
- NLP preprocessing for Karakalpak text (script normalization)
- Building Karakalpak-language applications that need locale-aware string operations
- Batch-converting text files via CLI

## When NOT to Use Kaalin

- **Not a translator** — it converts scripts (Latin ↔ Cyrillic), not languages
- **Not a spell-checker** — it does not validate or correct Karakalpak text
- **Not for other Turkic languages** — Kazakh, Uzbek, Turkish, etc. have different alphabets and rules
- **Not an OCR tool** — it works with digital text, not images
