RASMALAI: Resources for Adaptive Speech Modeling in Indian Languages with Accents and Intonations

📅 2025-05-24

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses the limited fine-grained controllability—specifically accent, emotion, and speaking style—in multilingual text-to-speech (TTS) systems for Indian languages and English. To this end, we introduce RASMALAI, a large-scale, multi-attribute speech dataset spanning 23 Indian languages and English, comprising 13,000 hours of speech and 24 million fine-grained textual attribute annotations. We also release IndicParlerTTS, the first open-source, text-description-driven TTS system tailored for Indian languages. Our approach employs text-description conditioning, multi-task attribute disentanglement, and cross-lingual representation sharing to enable robust cross-lingual and cross-speaker emotion, accent, and style transfer. Extensive evaluation demonstrates state-of-the-art performance across key metrics—including named-speaker synthesis fidelity, description adherence, attribute accuracy, and cross-lingual expressive transfer—establishing a new benchmark for controllable multilingual TTS in Indian languages.

Technology Category

Application Category

📝 Abstract

We introduce RASMALAI, a large-scale speech dataset with rich text descriptions, designed to advance controllable and expressive text-to-speech (TTS) synthesis for 23 Indian languages and English. It comprises 13,000 hours of speech and 24 million text-description annotations with fine-grained attributes like speaker identity, accent, emotion, style, and background conditions. Using RASMALAI, we develop IndicParlerTTS, the first open-source, text-description-guided TTS for Indian languages. Systematic evaluation demonstrates its ability to generate high-quality speech for named speakers, reliably follow text descriptions and accurately synthesize specified attributes. Additionally, it effectively transfers expressive characteristics both within and across languages. IndicParlerTTS consistently achieves strong performance across these evaluations, setting a new standard for controllable multilingual expressive speech synthesis in Indian languages.

Problem

Research questions and friction points this paper is trying to address.

Advancing controllable TTS for 23 Indian languages and English

Creating a large-scale dataset with fine-grained speech attributes

Developing open-source expressive TTS for Indian languages

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale dataset with rich text descriptions

First open-source text-description-guided TTS

Accurate synthesis of specified speech attributes

🔎 Similar Papers

No similar papers found.