Speech-to-Speech Translation Pipelines for Conversations in Low-Resource Languages

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Speech-to-speech translation (S2ST) for low-resource languages—such as Turkish, Pashto, and French—in community interpreting settings suffers from uneven quality across pipeline components. Method: We systematically construct and evaluate over 60 ASR–MT–TTS pipeline combinations, integrating locally fine-tuned models with commercial cloud APIs; evaluation employs dual-track validation via automatic metrics (BLEU, COMET, BLASER) and human assessment. Contribution/Results: We empirically demonstrate, for the first time in low-resource dialogue translation, that module performance rankings are largely independent of upstream/downstream components—enabling decoupled, modular optimization. The identified optimal pipeline significantly improves intelligibility and faithfulness on real-world community interpreting data. This work establishes a reusable, low-resource S2ST optimization paradigm, providing methodological foundations for end-to-end speech translation under resource constraints.

Technology Category

Application Category

📝 Abstract
The popularity of automatic speech-to-speech translation for human conversations is growing, but the quality varies significantly depending on the language pair. In a context of community interpreting for low-resource languages, namely Turkish and Pashto to/from French, we collected fine-tuning and testing data, and compared systems using several automatic metrics (BLEU, COMET, and BLASER) and human assessments. The pipelines included automatic speech recognition, machine translation, and speech synthesis, with local models and cloud-based commercial ones. Some components have been fine-tuned on our data. We evaluated over 60 pipelines and determined the best one for each direction. We also found that the ranks of components are generally independent of the rest of the pipeline.
Problem

Research questions and friction points this paper is trying to address.

Improving speech-to-speech translation for low-resource languages
Evaluating pipeline components for Turkish and Pashto to/from French
Determining optimal translation pipelines using metrics and human assessments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned local and cloud-based models
Evaluated 60 pipelines with multiple metrics
Optimized components independently for best performance
🔎 Similar Papers
No similar papers found.
Andrei Popescu-Belis
Andrei Popescu-Belis
Professor at HEIG-VD (HES-SO) - Senior Scientist at EPFL
Natural Language ProcessingMachine TranslationLanguage ResourcesLanguage Technology
A
Alexis Allemann
HEIG-VD / HES-SO, 1401 Yverdon-les-Bains, Switzerland
T
Teo Ferrari
HEIG-VD / HES-SO, 1401 Yverdon-les-Bains, Switzerland
G
Gopal Krishnamani
Bhaasha Sàrl, 1400 Yverdon-les-Bains, Switzerland