BnTTS: Few-Shot Speaker Adaptation in Low-Resource Setting

📅 2025-02-09

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

To address the challenges of speaker adaptation, low speech naturalness, and poor fidelity in few-shot text-to-speech (TTS) for low-resource Bengali, this paper proposes BnTTS—the first few-shot TTS framework specifically designed for Bengali. We innovatively adapt the multilingual TTS model XTTS to Bengali by incorporating language-specific phonological characteristics into the modeling architecture and pretraining it on 3.85k hours of Bengali speech-text pairs. Leveraging multilingual transfer learning and speaker embedding fine-tuning, BnTTS enables zero-shot and few-shot speaker adaptation. Experimental results demonstrate that BnTTS significantly outperforms the current state-of-the-art Bengali TTS systems in naturalness (MOS), intelligibility, and speaker similarity. This work fills a critical technical gap in high-quality, adaptive TTS for low-resource languages.

Technology Category

Application Category

📝 Abstract

This paper introduces BnTTS (Bangla Text-To-Speech), the first framework for Bangla speaker adaptation-based TTS, designed to bridge the gap in Bangla speech synthesis using minimal training data. Building upon the XTTS architecture, our approach integrates Bangla into a multilingual TTS pipeline, with modifications to account for the phonetic and linguistic characteristics of the language. We pre-train BnTTS on 3.85k hours of Bangla speech dataset with corresponding text labels and evaluate performance in both zero-shot and few-shot settings on our proposed test dataset. Empirical evaluations in few-shot settings show that BnTTS significantly improves the naturalness, intelligibility, and speaker fidelity of synthesized Bangla speech. Compared to state-of-the-art Bangla TTS systems, BnTTS exhibits superior performance in Subjective Mean Opinion Score (SMOS), Naturalness, and Clarity metrics.

Problem

Research questions and friction points this paper is trying to address.

Develops BnTTS for Bangla speech synthesis.

Adapts XTTS for Bangla phonetic characteristics.

Enhances naturalness and speaker fidelity in synthesis.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapts Bangla TTS with minimal data

Integrates Bangla into multilingual TTS

Enhances speech naturalness and clarity

🔎 Similar Papers

No similar papers found.