TeluguST-46: A Benchmark Corpus and Comprehensive Evaluation for Telugu-English Speech Translation

📅 2025-12-08

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Telugu (80 million speakers) speech translation has long suffered from the absence of high-quality benchmark data and low-resource modeling challenges. To address this, we introduce the first open-source Telugu–English speech translation benchmark—comprising 30/8/8 hours of training/validation/test data—curated from 46 hours of human-verified CSTD speech. We systematically compare an IndicWhisper + IndicMT cascade with fine-tuned end-to-end SeamlessM4T, finding that the latter achieves comparable performance to the cascade using only ~100 hours of parallel speech-text data. Empirical evaluation reveals that conventional metrics (e.g., BLEU) are more reliable than BERTScore for this language pair. We further propose best practices for automated evaluation tailored to morphologically rich languages. This work fills a critical gap in the field, establishing a reproducible benchmark and methodological foundation for low-resource speech translation.

Technology Category

Application Category

📝 Abstract

Despite Telugu being spoken by over 80 million people, speech translation research for this morphologically rich language remains severely underexplored. We address this gap by developing a high-quality Telugu--English speech translation benchmark from 46 hours of manually verified CSTD corpus data (30h/8h/8h train/dev/test split). Our systematic comparison of cascaded versus end-to-end architectures shows that while IndicWhisper + IndicMT achieves the highest performance due to extensive Telugu-specific training data, finetuned SeamlessM4T models demonstrate remarkable competitiveness despite using significantly less Telugu-specific training data. This finding suggests that with careful hyperparameter tuning and sufficient parallel data (potentially less than 100 hours), end-to-end systems can achieve performance comparable to cascaded approaches in low-resource settings. Our metric reliability study evaluating BLEU, METEOR, ChrF++, ROUGE-L, TER, and BERTScore against human judgments reveals that traditional metrics provide better quality discrimination than BERTScore for Telugu--English translation. The work delivers three key contributions: a reproducible Telugu--English benchmark, empirical evidence of competitive end-to-end performance potential in low-resource scenarios, and practical guidance for automatic evaluation in morphologically complex language pairs.

Problem

Research questions and friction points this paper is trying to address.

Develops a Telugu-English speech translation benchmark from 46 hours of verified data

Compares cascaded versus end-to-end architectures for low-resource Telugu translation

Evaluates automatic metrics' reliability against human judgments for this language pair

Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed a Telugu-English speech translation benchmark from 46 hours of verified data

Compared cascaded and end-to-end architectures, showing competitive end-to-end performance

Evaluated metrics, finding traditional ones better than BERTScore for this language pair

🔎 Similar Papers

No similar papers found.

Apple

Seattle, United States of America

AI Language Engineer

Cresta

$90,000–$160,000 + Offers Equity

United States (Remote) / US (Remote)

Member of Technical Staff - Voice Model

xAI

$150,000 - $450,000 USD

Palo Alto, CA / Palo Alto, CA, Palo Alto, California, United States

Research Scientist Intern, Multimodal AI (PhD)