Phonikud: Hebrew Grapheme-to-Phoneme Conversion for Real-Time Text-to-Speech

📅 2025-06-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Modern Hebrew text-to-speech (TTS) faces challenges including complex orthography, implicit stress placement, and sparse vowel diacritization, rendering existing grapheme-to-phoneme (G2P) approaches inadequate for high-accuracy, low-latency International Phonetic Alphabet (IPA) transcription. To address this, we propose the first lightweight two-stage G2P framework: Stage I leverages a pre-trained Hebrew diacritization model; Stage II introduces a compact neural adapter, augmented with rule-based post-processing and IPA mapping. We further present and publicly release ILSpeech—the first open-source Hebrew speech corpus annotated with IPA transcriptions. Our method achieves full phonemic normalization with zero measurable latency and substantially outperforms state-of-the-art G2P systems in accuracy. It enables training of a real-time, high-fidelity Hebrew TTS system, achieving the best-known speed–accuracy trade-off. All code, models, and data are open-sourced.

Technology Category

Application Category

📝 Abstract
Real-time text-to-speech (TTS) for Modern Hebrew is challenging due to the language's orthographic complexity. Existing solutions ignore crucial phonetic features such as stress that remain underspecified even when vowel marks are added. To address these limitations, we introduce Phonikud, a lightweight, open-source Hebrew grapheme-to-phoneme (G2P) system that outputs fully-specified IPA transcriptions. Our approach adapts an existing diacritization model with lightweight adaptors, incurring negligible additional latency. We also contribute the ILSpeech dataset of transcribed Hebrew speech with IPA annotations, serving as a benchmark for Hebrew G2P and as training data for TTS systems. Our results demonstrate that Phonikud G2P conversion more accurately predicts phonemes from Hebrew text compared to prior methods, and that this enables training of effective real-time Hebrew TTS models with superior speed-accuracy trade-offs. We release our code, data, and models at https://phonikud.github.io.
Problem

Research questions and friction points this paper is trying to address.

Real-time Hebrew text-to-speech faces orthographic complexity challenges
Existing Hebrew G2P systems lack stress and phonetic specifications
Need for accurate IPA transcription to improve Hebrew TTS models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight open-source Hebrew G2P system
Adapts diacritization model with lightweight adaptors
Introduces ILSpeech dataset with IPA annotations
🔎 Similar Papers
No similar papers found.
Y
Yakov Kolani
Independent Researcher
M
Maxim Melichov
Reichman University
C
Cobi Calev
Independent Researcher
Morris Alper
Morris Alper
Machine Learning Researcher
machine learningcomputational linguisticsnatural language processingmultimodal learning