NaijaS2ST: A Multi-Accent Benchmark for Speech-to-Speech Translation in Low-Resource Nigerian Languages

📅 2026-04-17

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

This work addresses the challenge of speech-to-speech translation for low-resource Nigerian languages—specifically Igbo, Hausa, Yoruba, and Nigerian Pidgin—where progress has been hindered by the scarcity of high-quality, multi-accent parallel speech data. The authors present NaijaS2ST, the first large-scale, real-world speech-to-speech translation dataset for these languages paired with English, and conduct a systematic evaluation of cascaded, end-to-end, and AudioLLM-based approaches on bidirectional translation tasks. Experimental results show that AudioLLMs outperform fine-tuned models in few-shot speech-to-text translation, yet in speech-to-speech translation, cascaded systems achieve performance on par with AudioLLMs, highlighting the need for dedicated architectural innovations to advance this challenging task.

Technology Category

Application Category

📝 Abstract

Speech translation for low-resource languages remains fundamentally limited by the scarcity of high-quality, diverse parallel speech data, a challenge that is especially pronounced in African linguistic contexts. To address this, we introduce NaijaS2ST, a parallel speech translation dataset spanning Igbo, Hausa, Yorùbá, and Nigerian Pidgin paired with English. The dataset comprises approximately 50 hours of speech per language and captures substantial variation in speakers and accents, reflecting realistic multilingual and multi-accent conditions. With NaijaS2ST, we conduct a comprehensive benchmark of cascaded, end-to-end (E2E), and AudioLLM-based approaches across bidirectional translation settings. Our results show that audio LLMs with few-shot examples are more effective for speech-to-text translation than cascaded and end-to-end methods trained on fine-tuned data. However, for speech-to-speech translation, the cascaded and audio LLM paradigms yield comparable performance, indicating that there is still considerable room for improvement in developing targeted, task-specific models for this setting. By providing both a high-quality dataset and a systematic benchmark, we hope that NaijaS2ST will serve as a strong foundation for advancing research in low-resource, multilingual speech translation.

Problem

Research questions and friction points this paper is trying to address.

low-resource languages

speech-to-speech translation

parallel speech data

multilingual

African languages

Innovation

Methods, ideas, or system contributions that make the work stand out.

low-resource speech translation

multi-accent dataset

AudioLLM