From Scarcity to Efficiency: Investigating the Effects of Data Augmentation on African Machine Translation

📅 2025-09-09

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

Low-resource African language machine translation (MT) is severely constrained by scarce parallel training data. To address this, this paper systematically investigates the synergistic effects of three data augmentation techniques—sentence concatenation, back-translation, and switch-out—within a neural MT framework. We conduct comprehensive ablation and comparative experiments across six representative low-resource African languages. Our results demonstrate, for the first time, that combining these methods yields significantly superior performance over any single technique: the proposed fusion strategy achieves average BLEU score improvements of ≥25% across all language pairs, substantially outperforming strong baselines. Notably, it exhibits enhanced robustness and generalization in extremely low-resource settings. This work provides a reproducible technical pathway for African language MT and offers empirical evidence and methodological insights for designing cross-lingual data augmentation mechanisms.

Technology Category

Application Category

📝 Abstract

The linguistic diversity across the African continent presents different challenges and opportunities for machine translation. This study explores the effects of data augmentation techniques in improving translation systems in low-resource African languages. We focus on two data augmentation techniques: sentence concatenation with back translation and switch-out, applying them across six African languages. Our experiments show significant improvements in machine translation performance, with a minimum increase of 25% in BLEU score across all six languages.We provide a comprehensive analysis and highlight the potential of these techniques to improve machine translation systems for low-resource languages, contributing to the development of more robust translation systems for under-resourced languages.

Problem

Research questions and friction points this paper is trying to address.

Improving machine translation for low-resource African languages

Investigating data augmentation effects on translation performance

Addressing linguistic diversity challenges with sentence concatenation techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Data augmentation techniques for low-resource languages

Sentence concatenation with back translation method

Switch-out technique improving translation performance significantly

🔎 Similar Papers

No similar papers found.