Phonikud: Hebrew Grapheme-to-Phoneme Conversion for Real-Time Text-to-Speech

📅 2025-06-14

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

Modern Hebrew text-to-speech (TTS) faces challenges including complex orthography, implicit stress placement, and sparse vowel diacritization, rendering existing grapheme-to-phoneme (G2P) approaches inadequate for high-accuracy, low-latency International Phonetic Alphabet (IPA) transcription. To address this, we propose the first lightweight two-stage G2P framework: Stage I leverages a pre-trained Hebrew diacritization model; Stage II introduces a compact neural adapter, augmented with rule-based post-processing and IPA mapping. We further present and publicly release ILSpeech—the first open-source Hebrew speech corpus annotated with IPA transcriptions. Our method achieves full phonemic normalization with zero measurable latency and substantially outperforms state-of-the-art G2P systems in accuracy. It enables training of a real-time, high-fidelity Hebrew TTS system, achieving the best-known speed–accuracy trade-off. All code, models, and data are open-sourced.

Technology Category

Application Category

📝 Abstract

Real-time text-to-speech (TTS) for Modern Hebrew is challenging due to the language's orthographic complexity. Existing solutions ignore crucial phonetic features such as stress that remain underspecified even when vowel marks are added. To address these limitations, we introduce Phonikud, a lightweight, open-source Hebrew grapheme-to-phoneme (G2P) system that outputs fully-specified IPA transcriptions. Our approach adapts an existing diacritization model with lightweight adaptors, incurring negligible additional latency. We also contribute the ILSpeech dataset of transcribed Hebrew speech with IPA annotations, serving as a benchmark for Hebrew G2P and as training data for TTS systems. Our results demonstrate that Phonikud G2P conversion more accurately predicts phonemes from Hebrew text compared to prior methods, and that this enables training of effective real-time Hebrew TTS models with superior speed-accuracy trade-offs. We release our code, data, and models at https://phonikud.github.io.

Problem

Research questions and friction points this paper is trying to address.

Real-time Hebrew text-to-speech faces orthographic complexity challenges

Existing Hebrew G2P systems lack stress and phonetic specifications

Need for accurate IPA transcription to improve Hebrew TTS models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight open-source Hebrew G2P system

Adapts diacritization model with lightweight adaptors

Introduces ILSpeech dataset with IPA annotations

🔎 Similar Papers

No similar papers found.