🤖 AI Summary
This study addresses the challenge of building effective AI language learning systems for low-resource African languages, which suffer from a scarcity of high-quality training data. To this end, the authors propose a data generation approach leveraging AFRILANGDICT, a newly constructed dictionary covering ten African languages aligned with English, to create AFRILANGEDU—a scalable and verifiable multi-turn question-answering educational dataset. Using this dataset, they train AI language tutoring models on multilingual large language models such as Llama-3-8B-IT and Gemma-2-9B-IT through supervised fine-tuning (SFT) combined with direct preference optimization (DPO). Experimental results demonstrate that the proposed method significantly outperforms baseline approaches across four automatic evaluation metrics, with the joint SFT and DPO training yielding performance gains ranging from 1.8% to 15.5%.
📝 Abstract
How can language learning systems be developed for languages that lack sufficient training resources? This challenge is increasingly faced by developers across the African continent who aim to build AI systems capable of understanding and responding in local languages. To address this gap, we introduce AFRILANGDICT, a collection of 194.7K African language-English dictionary entries designed as seed resources for generating language-learning materials, enabling us to automatically construct large-scale, diverse, and verifiable student-tutor question-answer interactions suitable for training AI-assisted language tutors. Using AFRILANGDICT, we build AFRILANGEDU, a dataset of 78.9K multi-turn training examples for Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). Using AFRILANGEDU, we train language tutoring models collectively referred to as AFRILANGTUTOR. We fine-tune two multilingual LLMs: Llama-3-8B-IT and Gemma-3-12B-IT on AFRILANGEDU across 10 African languages and evaluate their performance. Our results show that models trained on AFRILANGEDU consistently outperform their base counterparts, and combining SFT and DPO yields substantial improvements, with gains ranging from 1.8% to 15.5% under LLM-as-a-judge evaluations across four criteria. To facilitate further research on low-resource languages -- all resources are available at https://huggingface.co/afrilang-edu.