ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

To address the longstanding scarcity of high-quality, phonemically annotated open-source multi-speaker datasets for Modern Standard Arabic (MSA) text-to-speech (TTS), this work introduces ArVoice—the first publicly available multi-speaker MSA TTS dataset. ArVoice integrates professionally recorded speech, curated open-source corpora, and high-fidelity synthetic speech generated by advanced TTS models, encompassing 11 speakers and 83.52 hours of audio, all accompanied by phoneme-level forced alignments. It is the first dataset to simultaneously provide multi-speaker MSA speech and corresponding phonemic annotations under an open license. ArVoice significantly advances research in MSA TTS, phoneme recovery, voice conversion (VC), and deepfake detection. Empirical validation using state-of-the-art TTS models (e.g., FastSpeech2, VITS) and VC systems confirms its effectiveness. The dataset is publicly released for non-commercial academic use.

Technology Category

Application Category

📝 Abstract

We introduce ArVoice, a multi-speaker Modern Standard Arabic (MSA) speech corpus with diacritized transcriptions, intended for multi-speaker speech synthesis, and can be useful for other tasks such as speech-based diacritic restoration, voice conversion, and deepfake detection. ArVoice comprises: (1) a new professionally recorded set from six voice talents with diverse demographics, (2) a modified subset of the Arabic Speech Corpus; and (3) high-quality synthetic speech from two commercial systems. The complete corpus consists of a total of 83.52 hours of speech across 11 voices; around 10 hours consist of human voices from 7 speakers. We train three open-source TTS and two voice conversion systems to illustrate the use cases of the dataset. The corpus is available for research use.

Problem

Research questions and friction points this paper is trying to address.

Creating a multi-speaker Arabic speech dataset for synthesis

Providing diacritized transcriptions for diverse Arabic speech tasks

Enabling research in TTS and voice conversion for Arabic

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-speaker Arabic speech corpus with diacritics

Combines professional recordings and synthetic speech

Trains TTS and voice conversion systems

🔎 Similar Papers

No similar papers found.