BnTTS: Few-Shot Speaker Adaptation in Low-Resource Setting

πŸ“… 2025-02-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the challenges of speaker adaptation, low speech naturalness, and poor fidelity in few-shot text-to-speech (TTS) for low-resource Bengali, this paper proposes BnTTSβ€”the first few-shot TTS framework specifically designed for Bengali. We innovatively adapt the multilingual TTS model XTTS to Bengali by incorporating language-specific phonological characteristics into the modeling architecture and pretraining it on 3.85k hours of Bengali speech-text pairs. Leveraging multilingual transfer learning and speaker embedding fine-tuning, BnTTS enables zero-shot and few-shot speaker adaptation. Experimental results demonstrate that BnTTS significantly outperforms the current state-of-the-art Bengali TTS systems in naturalness (MOS), intelligibility, and speaker similarity. This work fills a critical technical gap in high-quality, adaptive TTS for low-resource languages.

Technology Category

Application Category

πŸ“ Abstract
This paper introduces BnTTS (Bangla Text-To-Speech), the first framework for Bangla speaker adaptation-based TTS, designed to bridge the gap in Bangla speech synthesis using minimal training data. Building upon the XTTS architecture, our approach integrates Bangla into a multilingual TTS pipeline, with modifications to account for the phonetic and linguistic characteristics of the language. We pre-train BnTTS on 3.85k hours of Bangla speech dataset with corresponding text labels and evaluate performance in both zero-shot and few-shot settings on our proposed test dataset. Empirical evaluations in few-shot settings show that BnTTS significantly improves the naturalness, intelligibility, and speaker fidelity of synthesized Bangla speech. Compared to state-of-the-art Bangla TTS systems, BnTTS exhibits superior performance in Subjective Mean Opinion Score (SMOS), Naturalness, and Clarity metrics.
Problem

Research questions and friction points this paper is trying to address.

Develops BnTTS for Bangla speech synthesis.
Adapts XTTS for Bangla phonetic characteristics.
Enhances naturalness and speaker fidelity in synthesis.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapts Bangla TTS with minimal data
Integrates Bangla into multilingual TTS
Enhances speech naturalness and clarity
πŸ”Ž Similar Papers
No similar papers found.
M
Mohammad Jahid Ibna Basher
Hishab Singapore Pte. Ltd, Singapore
M
Md. Kowsher
University of Central Florida, USA
M
Md Saiful Islam
Hishab Singapore Pte. Ltd, Singapore
R
R. N. Nandi
Hishab Singapore Pte. Ltd, Singapore
Nusrat Jahan Prottasha
Nusrat Jahan Prottasha
Graduate Research Assistant, Stevens Institute of Technology
NLPCVMLLanguage and VisionAI Agents
M
Mehadi Hasan Menon
Hishab Singapore Pte. Ltd, Singapore
Tareq Al Muntasir
Tareq Al Muntasir
Chief Technology Officer, Verbex.ai (formerly Hishab)
Automatic Speech recognitionText to speech
Shammur Absar Chowdhury
Shammur Absar Chowdhury
Qatar Computing Research Institute
Conversational AIRepresentation LearningDeep LearningSpeech processingNLP
F
Firoj Alam
Qatar Computing Research Institute, Qatar
Niloofar Yousefi
Niloofar Yousefi
Assistant Professor
Generative AI for ScienceAI-Guided NanomedicineNext-Gen Therapeutics
O
O. Garibay
University of Central Florida, USA