Metadata-Version: 2.4
Name: sindhinltk
Version: 1.0.2
Summary: A Morphology-Aware NLP Toolkit for the Sindhi Language
Author-email: Aakash Meghwar <aakashmeghwar01@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/AakashKumarMissrani/SindhiNLTK
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: transformers
Requires-Dist: regex
Requires-Dist: torch

# 🛡️ SindhiNLTK: Morphology-Aware NLP Toolkit for Sindhi

[![PyPI version](https://img.shields.io/pypi/v/sindhinltk.svg)](https://pypi.org/project/sindhinltk/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

**SindhiNLTK** is a high-performance Python library designed to solve "Subword Shattering" in Sindhi Language Processing. Standard multilingual models (like Llama-3 and mBERT) break unique Sindhi orthographic clusters into semantically meaningless tokens. 

SindhiNLTK introduces a **V3 Linguistic Shield**—a hybrid Regex-BPE architecture—that protects the integrity of the 52-letter Sindhi alphabet, reducing the Token Fertility Rate from 4.15 to 1.06 and preserving 100% of aspirated digraphs.

---

## 🚀 Installation

```bash
pip install sindhinltk
