🤖 AI Summary
This study addresses the challenge of limited natural language processing performance for low-resource African languages—such as Hausa and Fon—due to data scarcity. It systematically evaluates two data augmentation strategies, namely synthetic data generation using the large language model Gemini 2.5 Flash and back-translation via NLLB-200, on named entity recognition (NER) and part-of-speech (POS) tagging tasks, using the MasakhaNER 2.0 and MasakhaPOS benchmarks. The findings reveal that augmentation efficacy is primarily governed by task type rather than language resource level or synthetic data quality, with the same augmented data yielding opposing effects across tasks—thereby challenging the common assumption that LLM-generated data quality directly determines augmentation success. Specifically, neither method improved NER performance, often degrading F1 scores, while in POS tagging, only LLM-based augmentation slightly boosted accuracy for Fon (+0.33%) and back-translation for Hausa (+0.17%), with other combinations proving ineffective or detrimental.
📝 Abstract
Data scarcity limits NLP development for low-resource African languages. We evaluate two data augmentation methods -- LLM-based generation (Gemini 2.5 Flash) and back-translation (NLLB-200) -- for Hausa and Fongbe, two West African languages that differ substantially in LLM generation quality. We assess augmentation on named entity recognition (NER) and part-of-speech (POS) tagging using MasakhaNER 2.0 and MasakhaPOS benchmarks. Our results reveal that augmentation effectiveness depends on task type rather than language or LLM quality alone. For NER, neither method improves over baseline for either language; LLM augmentation reduces Hausa NER by 0.24% F1 and Fongbe NER by 1.81% F1. For POS tagging, LLM augmentation improves Fongbe by 0.33% accuracy, while back-translation improves Hausa by 0.17%; back-translation reduces Fongbe POS by 0.35% and has negligible effect on Hausa POS. The same LLM-generated synthetic data produces opposite effects across tasks for Fongbe -- hurting NER while helping POS -- suggesting task structure governs augmentation outcomes more than synthetic data quality. These findings challenge the assumption that LLM generation quality predicts augmentation success, and provide actionable guidance: data augmentation should be treated as a task-specific intervention rather than a universally beneficial preprocessing step.