AfrIFact: Cultural Information Retrieval, Evidence Extraction and Fact Checking for African Languages

πŸ“… 2026-04-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the growing threat of online misinformation in low-resource African language communities, particularly concerning health and cultural topics, where automated fact-checking tools are scarce. To bridge this gap, the authors introduce AfrIFact, an end-to-end multilingual fact-checking dataset spanning ten African languages alongside English, establishing the first comprehensive benchmark for information retrieval, evidence extraction, and claim verification in these languages. The proposed approach leverages cross-lingual embeddings and AfriqueQwen-14B, a large language model tailored for African languages, enhanced through few-shot prompting and task-specific fine-tuning. Experiments demonstrate that few-shot prompting improves fact-checking accuracy by up to 43%, with further gains of 26% achieved through fine-tuning. The study also reveals that cultural and news-related documents are more readily retrievable than medical ones, highlighting limitations in current models’ cross-lingual retrieval and multilingual verification capabilities, thereby advancing research on trustworthy information access in low-resource linguistic contexts.
πŸ“ Abstract
Assessing the veracity of a claim made online is a complex and important task with real-world implications. When these claims are directed at communities with limited access to information and the content concerns issues such as healthcare and culture, the consequences intensify, especially in low-resource languages. In this work, we introduce AfrIFact, a dataset that covers the necessary steps for automatic fact-checking (i.e., information retrieval, evidence extraction, and fact checking), in ten African languages and English. Our evaluation results show that even the best embedding models lack cross-lingual retrieval capabilities, and that cultural and news documents are easier to retrieve than healthcare-domain documents, both in large corpora and in single documents. We show that LLMs lack robust multilingual fact-verification capabilities in African languages, while few-shot prompting improves performance by up to 43% in AfriqueQwen-14B, and task-specific fine-tuning further improves fact-checking accuracy by up to 26%. These findings, along with our release of the AfrIFact dataset, encourage work on low-resource information retrieval, evidence retrieval, and fact checking.
Problem

Research questions and friction points this paper is trying to address.

fact-checking
low-resource languages
information retrieval
evidence extraction
African languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

low-resource languages
cross-lingual retrieval
fact-checking
few-shot prompting
multilingual LLMs
πŸ”Ž Similar Papers
No similar papers found.
Israel Abebe Azime
Israel Abebe Azime
Saarland University
NLP | Multimodal learning | Deep Learning Applications
Jesujoba Oluwadara Alabi
Jesujoba Oluwadara Alabi
Saarland University
Natural Language ProcessingNeural Machine TranslationMachine LearningInformation Extraction
Crystina Zhang
Crystina Zhang
University of Waterloo
Information RetrievalNatural Language Processing
I
Iffat Maab
National Institute of Informatics, Japan
Atnafu Lambebo Tonja
Atnafu Lambebo Tonja
Postdoc at MBZUAI
NLP for low-resource languagesMultilingual language modelsSpeech Technology
Tadesse Destaw Belay
Tadesse Destaw Belay
Ph.D. candidate IPN, Mexico
NLP for Low-resource languagesMachine learningand LLMs
F
Folasade Peace Alabi
University of Ilorin, Nigeria
Salomey Osei
Salomey Osei
University of Deusto
Machine LearningNLPAuto ML
S
Saminu Mohammad Aliyu
Bayero University, Nigeria
N
Nkechinyere Faith Aguobi
University of Lagos, Nigeria
B
Bontu Fufa Balcha
Addis Ababa University, Ethiopia
B
Blessing Kudzaishe Sibanda
D
Davis David
Black Swan
M
Mouhamadane Mboup
Universite Alioune Diop, Senegal
D
Daud Abolade
N
Neo Putini
Philipp Slusallek
Philipp Slusallek
Professor for Computer Graphics, Saarland University & DFKI, Saarland Informatics Campus
Visual ComputingComputer GraphicsArtificial Intelligence & Machine LearningHigh-Performance Computing
David Ifeoluwa Adelani
David Ifeoluwa Adelani
McGill University and Mila - Quebec AI Institute and Canada CIFAR AI Chair
Natural language processingMultilingualityMultilingual NLPAfricaNLPLow-resource NLP
Dietrich Klakow
Dietrich Klakow
Saarland University, Saarland Informatics Campus, PharmaScienceHub
Natural Language ProcessingSpeech ProcessingQuestion AnsweringMachine Learning