Real-time text-to-speech (TTS) for Modern Hebrew remains challenging due to the language’s complex orthography. Many existing solutions overlook important phonetic features such as stress, which are often missing even when vowel marks are used.
To overcome these issues, we present Phonikud, a lightweight, open-source grapheme-to-phoneme (G2P) system that outputs fully specified IPA transcriptions for Hebrew text. Our method adapts an existing diacritization model with lightweight adaptors, adding minimal latency.
Additionally, we release ILSpeech, a novel Hebrew speech dataset with expert-annotated IPA transcriptions. This dataset serves as the first benchmark for Hebrew G2P and provides valuable training data for TTS systems.
Experimental results show that Phonikud achieves significantly higher accuracy in Hebrew G2P compared to previous approaches, improving the quality of downstream TTS systems.
Listen to samples generated by our Phonikud system across different speakers and text complexity levels.
Comparative evaluation of Phonikud against existing Hebrew TTS approaches
| Text Sample | MMS | RoboShaul | LoHTM | Phonikud (Ours) |
|---|---|---|---|---|
| אז אני חושב שזה צירוף של כמה דברים | ||||
| כל מה שאתם צריכים לעשות כדי לזכות | ||||
| עוזבים את החברה, נגיד, אחרי שנה מסויימת |
@misc{alper2025phonikud,
title={Phonikud: Hebrew Grapheme-to-Phoneme Conversion for Real-Time},
author={Yakov Kolani and Maxim Melichov and Cobi Calev and Morris Alper},
year={2025},
eprint={2025.xxxxx},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2025.xxxxx},
}