Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis

📅 2026-01-20

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the limited progress in Arabic multi-dialectal text-to-speech (TTS) synthesis, which has been hindered by the absence of unified modeling approaches, standardized datasets, and evaluation benchmarks. The authors propose the first open-source, end-to-end TTS system specifically designed for multi-dialectal Arabic, leveraging publicly available automatic speech recognition (ASR) corpora to construct a unified training framework. By integrating linguistically informed curriculum learning and in-context learning mechanisms, the system enables high-quality synthesis across both high- and low-resource dialects without relying on diacritized input text. This study establishes the first open-source TTS models and evaluation benchmark for multi-dialectal Arabic, demonstrating superior synthesis quality compared to mainstream commercial services and significantly advancing standardization and reproducibility in the field.

Technology Category

Application Category

📝 Abstract

A notable gap persists in speech synthesis research and development for Arabic dialects, particularly from a unified modeling perspective. Despite its high practical value, the inherent linguistic complexity of Arabic dialects, further compounded by a lack of standardized data, benchmarks, and evaluation guidelines, steers researchers toward safer ground. To bridge this divide, we present Habibi, a suite of specialized and unified text-to-speech models that harnesses existing open-source ASR corpora to support a wide range of high- to low-resource Arabic dialects through linguistically-informed curriculum learning. Our approach outperforms the leading commercial service in generation quality, while maintaining extensibility through effective in-context learning, without requiring text diacritization. We are committed to open-sourcing the model, along with creating the first systematic benchmark for multi-dialect Arabic speech synthesis. Furthermore, by identifying the key challenges in and establishing evaluation standards for the process, we aim to provide a solid groundwork for subsequent research. Resources at https://SWivid.github.io/Habibi/ .

Problem

Research questions and friction points this paper is trying to address.

Arabic dialects

speech synthesis

unified modeling

benchmark

data standardization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified Arabic TTS

Curriculum Learning

In-Context Learning

Dialectal Speech Synthesis

Open-Source Benchmark

🔎 Similar Papers

No similar papers found.