AfriScience-MT: Towards Decolonizing Science in Africa through Text Translation

๐Ÿ“… 2026-05-28
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses the marginalization of African languages in scientific communication, which impedes access to and production of scientific knowledge for hundreds of millions of speakers due to the absence of standardized scientific terminology. To bridge this gap, the authors present AfriScience-MT, the first systematically constructed parallel corpus spanning six African languages and eleven scientific domains, developed through collaboration between professional translators and science communication experts who translated paper abstracts and coined missing technical terms. The resource enables machine translation evaluation under zero-shot, few-shot, and fine-tuned settings. Evaluations reveal that closed-source models (GPT-5.4 and Gemini) achieve state-of-the-art performance (sentence-level COMET scores of 68.3 and 68.0, respectively), while fine-tuned open-source models (NLLB-1.3B and TranslateGemma-12B) also demonstrate strong results. This work advances the decolonization and localization of scientific knowledge and fills a critical void in African-language scientific translation resources.
๐Ÿ“ Abstract
The dominance of colonial languages in African education and scientific communication limits how hundreds of millions of speakers of African languages access and produce scientific knowledge. A core obstacle is the lack of established scientific terminology in these languages. We introduce AfriScience-MT, a parallel corpus covering six African languages (Amharic, Hausa, Luganda, Northern Sotho, Yorรนbรก, and isiZulu) across 11 scientific domains. Professional translators, working with expert science communicators, translated plain-language summaries of scientific papers into each target language and created new terms where none existed. We benchmark machine translation systems and large language models in zero-shot, few-shot, and fine-tuned settings. Our results show that closed-source models outperform all open-source models at both the sentence and document levels: GPT-5.4 and Gemini-3.1-Flash-Lite lead with average sentence-level COMET scores of 68.3 and 68.0, respectively, and tie at an average document-level COMET of 48.3. Among open systems, fine-tuned NLLB-1.3B reaches 67.3 at the sentence level, and TranslateGemma-12B reaches 44.0 at the document level with 1-shot in-context learning. We release AfriScience-MT to support benchmarking and document-level scientific MT for African languages.
Problem

Research questions and friction points this paper is trying to address.

decolonization
scientific terminology
African languages
knowledge access
science communication
Innovation

Methods, ideas, or system contributions that make the work stand out.

scientific machine translation
African languages
parallel corpus
terminology creation
document-level MT
๐Ÿ”Ž Similar Papers
No similar papers found.
Idris Abdulmumin
Idris Abdulmumin
Postdoctoral Fellow, DSFSI, University of Pretoria
Machine TranslationNeural Machine TranslationNatural Language ProcessingInternet Technology
Tajuddeen Gwadabe
Tajuddeen Gwadabe
University of Chinese Academy of Sciences
Data miningRecommender SystemsNLP
Shamsuddeen Hassan Muhammad
Shamsuddeen Hassan Muhammad
Bayero University, Kano, & Google DeepMind Academic Fellow at Imperial College London
Natural Language ProcessingSentiment AnalysisAfricaNLPLow-resource NLPMultilinguality
David Ifeoluwa Adelani
David Ifeoluwa Adelani
McGill University and Mila - Quebec AI Institute and Canada CIFAR AI Chair
Natural language processingMultilingualityMultilingual NLPAfricaNLPLow-resource NLP
N
Nomonde Khalo
University of Cape Town
Ibrahim Said Ahmad
Ibrahim Said Ahmad
Northeastern University
Natural Language ProcessingBig DataData miningArtificial Intelligence
A
Abiodun Modupe
Data Science for Social Impact, University of Pretoria
A
Anina Mumm
Independent Consultant
S
Sibusiso Biyela
University of South Africa
M
Michelle Rabie
Independent Researcher
J
Johanna Havemann
Access 2 Perspectives
Marek Rei
Marek Rei
Associate Professor, Imperial College London
Artificial IntelligenceLanguage ModelingMachine LearningNatural Language Processing
Jade Abbott
Jade Abbott
Masakhane, Lelapa AI
Natural Language ProcessingArtificial IntelligenceComputational Intelligence
Vukosi Marivate
Vukosi Marivate
University of Pretoria, Lelapa AI, Deep Learning Indaba, Masakhane Research Foundation
Data ScienceNatural Language ProcessingMachine LearningArtificial IntelligenceReinforcement