IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models

📅 2024-06-05
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the evaluation and enhancement of large language model (LLM) capabilities for low-resource African languages. Focusing on natural language inference, mathematical reasoning, and multiple-choice knowledge question answering, we introduce IrokoBench—the first typologically balanced, human-translated, multi-task benchmark covering 17 African languages. We propose a novel test-set translation paradigm to systematically assess zero-shot and few-shot performance across 10 open-source and 6 closed-source LLMs. Our analysis reveals, for the first time, a performance compensation effect under test-set translation for English-centric models, quantifying a substantial gap between LLM performance on African languages versus English: average accuracy is only 60% of English performance, with Gemma 2 27B achieving merely 63% of GPT-4o’s score. Crucially, test-set translation significantly boosts performance of strong English-language models—including Gemma 2 27B and LLaMA 3.1 70B—on African languages.

Technology Category

Application Category

📝 Abstract
Despite the widespread adoption of Large language models (LLMs), their remarkable capabilities remain limited to a few high-resource languages. Additionally, many low-resource languages (eg African languages) are often evaluated only on basic text classification tasks due to the lack of appropriate or comprehensive benchmarks outside of high-resource languages. In this paper, we introduce IrokoBench -- a human-translated benchmark dataset for 17 typologically-diverse low-resource African languages covering three tasks: natural language inference~(AfriXNLI), mathematical reasoning~(AfriMGSM), and multi-choice knowledge-based question answering~(AfriMMLU). We use IrokoBench to evaluate zero-shot, few-shot, and translate-test settings~(where test sets are translated into English) across 10 open and six proprietary LLMs. Our evaluation reveals a significant performance gap between high-resource languages~(such as English and French) and low-resource African languages. We observe a significant performance gap between open and proprietary models, with the highest performing open model, Gemma 2 27B only at 63% of the best-performing proprietary model GPT-4o performance. In addition, machine translating the test set to English before evaluation helped to close the gap for larger models that are English-centric, such as Gemma 2 27B and LLaMa 3.1 70B. These findings suggest that more efforts are needed to develop and adapt LLMs for African languages.
Problem

Research questions and friction points this paper is trying to address.

African Languages
Large Language Models
Performance Improvement
Innovation

Methods, ideas, or system contributions that make the work stand out.

IrokoBench
African Languages
Model Performance Evaluation
🔎 Similar Papers
No similar papers found.
David Ifeoluwa Adelani
David Ifeoluwa Adelani
McGill University and Mila - Quebec AI Institute and Canada CIFAR AI Chair
Natural language processingMultilingualityMultilingual NLPAfricaNLPLow-resource NLP
J
Jessica Ojo
Lelapa AI, Masakhane NLP
Israel Abebe Azime
Israel Abebe Azime
Saarland University
NLP | Multimodal learning | Deep Learning Applications
Z
Zhuang Yun Jian
University of Toronto
Jesujoba Oluwadara Alabi
Jesujoba Oluwadara Alabi
Saarland University
Natural Language ProcessingNeural Machine TranslationMachine LearningInformation Extraction
Xuanli He
Xuanli He
UCL
Natural Language ProcessingAI SafetyMachine Learning
Millicent Ochieng
Millicent Ochieng
Microsoft
Natural Language ProcessingMachine LearningArtificial Intelligence
Sara Hooker
Sara Hooker
Head of Cohere For AI
Machine learning efficiencyrobustnessinterpretabilitytrustworthy ML
A
Andiswa Bukula
SADiLaR, Masakhane NLP
En-Shiun Annie Lee
En-Shiun Annie Lee
Ontario Tech University, and University of Toronto (Status-Only)
Natural Language ProcessingData MiningPattern Analysis
C
Chiamaka Chukwuneke
Lancaster University
H
Happy Buzaaba
Princeton University
B
Blessing K. Sibanda
Masakhane NLP
G
Godson Kalipe
Masakhane NLP
J
Jonathan Mukiibi
Makerere University, Masakhane NLP
S
Salomon Kabongo
Leibniz Universität Hannover, Masakhane NLP
F
Foutse Yuehgoh
Le CNAM, Masakhane NLP
M
M. Setaka
SADiLaR, Masakhane NLP
L
Lolwethu Ndolela
Masakhane NLP
N
N. Odu
Masakhane NLP
R
Rooweither Mabuya
SADiLaR, Masakhane NLP
Shamsuddeen Hassan Muhammad
Shamsuddeen Hassan Muhammad
Bayero University, Kano, & Google DeepMind Academic Fellow at Imperial College London
Natural Language ProcessingSentiment AnalysisAfricaNLPLow-resource NLPMultilinguality
Salomey Osei
Salomey Osei
University of Deusto
Machine LearningNLPAuto ML
S
Sokhar Samb
DAUST, Masakhane NLP
T
Tadesse Kebede Guge
Haramaya University, Masakhane NLP
P
Pontus Stenetorp
University College London, Masakhane NLP