Where Are We? Evaluating LLM Performance on African Languages

📅 2025-02-26

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This study investigates how language policies engendering data inequality systematically degrade large language model (LLM) performance on African languages. To address this, we introduce Sahara—the first comprehensive, continent-wide evaluation benchmark covering Africa’s full linguistic diversity, including both official and indigenous languages (50+ languages), and uniquely integrating sociolinguistic theory into NLP evaluation. Our methodology comprises multilingual benchmark construction, zero- and few-shot cross-lingual evaluation, data-scarcity attribution analysis, and causal modeling linking policy, data availability, and model performance. Results reveal that only a handful of languages (e.g., Swahili) achieve usable performance levels; over 70% of indigenous languages suffer severe degradation due to extreme training data sparsity. We propose an actionable data-inclusivity framework and policy-coordination pathways, offering both a methodological paradigm and practical guidelines for advancing global linguistic equity in AI.

Technology Category

Application Category

📝 Abstract

Africa's rich linguistic heritage remains underrepresented in NLP, largely due to historical policies that favor foreign languages and create significant data inequities. In this paper, we integrate theoretical insights on Africa's language landscape with an empirical evaluation using Sahara - a comprehensive benchmark curated from large-scale, publicly accessible datasets capturing the continent's linguistic diversity. By systematically assessing the performance of leading large language models (LLMs) on Sahara, we demonstrate how policy-induced data variations directly impact model effectiveness across African languages. Our findings reveal that while a few languages perform reasonably well, many Indigenous languages remain marginalized due to sparse data. Leveraging these insights, we offer actionable recommendations for policy reforms and inclusive data practices. Overall, our work underscores the urgent need for a dual approach - combining theoretical understanding with empirical evaluation - to foster linguistic diversity in AI for African communities.

Problem

Research questions and friction points this paper is trying to address.

Evaluate LLM performance on African languages.

Address data inequities in African NLP.

Assess impact of policy on model effectiveness.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Comprehensive benchmark for African languages

Evaluate LLMs on diverse linguistic datasets

Policy reforms for inclusive data practices

🔎 Similar Papers

AfroBench: How Good are Large Language Models on African Languages?