ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis

πŸ“… 2025-05-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the longstanding scarcity of high-quality, phonemically annotated open-source multi-speaker datasets for Modern Standard Arabic (MSA) text-to-speech (TTS), this work introduces ArVoiceβ€”the first publicly available multi-speaker MSA TTS dataset. ArVoice integrates professionally recorded speech, curated open-source corpora, and high-fidelity synthetic speech generated by advanced TTS models, encompassing 11 speakers and 83.52 hours of audio, all accompanied by phoneme-level forced alignments. It is the first dataset to simultaneously provide multi-speaker MSA speech and corresponding phonemic annotations under an open license. ArVoice significantly advances research in MSA TTS, phoneme recovery, voice conversion (VC), and deepfake detection. Empirical validation using state-of-the-art TTS models (e.g., FastSpeech2, VITS) and VC systems confirms its effectiveness. The dataset is publicly released for non-commercial academic use.

Technology Category

Application Category

πŸ“ Abstract
We introduce ArVoice, a multi-speaker Modern Standard Arabic (MSA) speech corpus with diacritized transcriptions, intended for multi-speaker speech synthesis, and can be useful for other tasks such as speech-based diacritic restoration, voice conversion, and deepfake detection. ArVoice comprises: (1) a new professionally recorded set from six voice talents with diverse demographics, (2) a modified subset of the Arabic Speech Corpus; and (3) high-quality synthetic speech from two commercial systems. The complete corpus consists of a total of 83.52 hours of speech across 11 voices; around 10 hours consist of human voices from 7 speakers. We train three open-source TTS and two voice conversion systems to illustrate the use cases of the dataset. The corpus is available for research use.
Problem

Research questions and friction points this paper is trying to address.

Creating a multi-speaker Arabic speech dataset for synthesis
Providing diacritized transcriptions for diverse Arabic speech tasks
Enabling research in TTS and voice conversion for Arabic
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-speaker Arabic speech corpus with diacritics
Combines professional recordings and synthetic speech
Trains TTS and voice conversion systems
πŸ”Ž Similar Papers
No similar papers found.