Metadata-Version: 2.4
Name: gpe-tokenizer
Version: 0.1.2
Summary: Grapheme Pair Encoding Tokenizer for Sinhala Language
Author: Schizo00
Author-email: naween.k@live.com
Requires-Python: >=3.13, <3.15
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Dist: accelerate (>=1.10.1,<2.0.0)
Requires-Dist: datasets (>=4.2.0,<5.0.0)
Requires-Dist: grapheme (>=0.6.0,<0.7.0)
Requires-Dist: huggingface-hub (>=0.35.3,<0.36.0)
Requires-Dist: joblib (>=1.5.2,<2.0.0)
Requires-Dist: regex (>=2025.9.18,<2026.0.0)
Requires-Dist: tokenizers (>=0.22.1)
Requires-Dist: torch (>=2.9.0,<3.0.0)
Requires-Dist: transformers (>=4.57.1,<5.0.0)
Description-Content-Type: text/markdown

## Tokenizer Training Details
#### Corpus Size: 5 Million Sentences
#### Vocab Size: 16000
#### Training Time: 2H 43M
