From Scarcity to Efficiency: Investigating the Effects of Data Augmentation on African Machine Translation

📅 2025-09-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Low-resource African language machine translation (MT) is severely constrained by scarce parallel training data. To address this, this paper systematically investigates the synergistic effects of three data augmentation techniques—sentence concatenation, back-translation, and switch-out—within a neural MT framework. We conduct comprehensive ablation and comparative experiments across six representative low-resource African languages. Our results demonstrate, for the first time, that combining these methods yields significantly superior performance over any single technique: the proposed fusion strategy achieves average BLEU score improvements of ≥25% across all language pairs, substantially outperforming strong baselines. Notably, it exhibits enhanced robustness and generalization in extremely low-resource settings. This work provides a reproducible technical pathway for African language MT and offers empirical evidence and methodological insights for designing cross-lingual data augmentation mechanisms.

Technology Category

Application Category

📝 Abstract
The linguistic diversity across the African continent presents different challenges and opportunities for machine translation. This study explores the effects of data augmentation techniques in improving translation systems in low-resource African languages. We focus on two data augmentation techniques: sentence concatenation with back translation and switch-out, applying them across six African languages. Our experiments show significant improvements in machine translation performance, with a minimum increase of 25% in BLEU score across all six languages.We provide a comprehensive analysis and highlight the potential of these techniques to improve machine translation systems for low-resource languages, contributing to the development of more robust translation systems for under-resourced languages.
Problem

Research questions and friction points this paper is trying to address.

Improving machine translation for low-resource African languages
Investigating data augmentation effects on translation performance
Addressing linguistic diversity challenges with sentence concatenation techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Data augmentation techniques for low-resource languages
Sentence concatenation with back translation method
Switch-out technique improving translation performance significantly
🔎 Similar Papers
No similar papers found.
Mardiyyah Oduwole
Mardiyyah Oduwole
ML Collective
ML EfficiencyNLP for low resource languages & Social Good
O
Oluwatosin Olajide
ML Collective
J
Jamiu Suleiman
ML Collective
F
Faith Hunja
ML Collective
Busayo Awobade
Busayo Awobade
Research Scientist, MLCollective
Speech processingMultilinguality.
F
Fatimo Adebanjo
ML Collective
C
Comfort Akanni
ML Collective
C
Chinonyelum Igwe
ML Collective
P
Peace Ododo
ML Collective
P
Promise Omoigui
ML Collective
Steven Kolawole
Steven Kolawole
Carnegie Mellon University
ML Efficiency
Abraham Owodunni
Abraham Owodunni
The Ohio State University
Multilingual NLPLow-resource NLPEfficient ML