INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages

📅 2025-02-13

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Existing multilingual dialogue benchmarks exhibit strong bias toward high-resource and Western-centric languages, neglecting cultural appropriateness and linguistic diversity in low-resource African languages. Method: We introduce AFRICA-NLU—the first open-source, fully localized intent classification and slot filling benchmark covering 16 African languages, with utterances authored by native speakers and grounded in authentic scenarios (e.g., banking, travel). Unlike translation-based approaches, it employs a hybrid annotation pipeline combining multi-round human verification and GPT-4o-assisted labeling. Contribution/Results: We conduct the first systematic evaluation of mainstream LLMs and fine-tuned models on African languages, revealing substantial performance gaps: GPT-4o achieves only 26.0 F1 for slot filling and 70.6% intent accuracy, whereas a culturally adapted multilingual Transformer model attains 81.2% F1 and 85.7% accuracy. These results demonstrate that natively collected, culturally grounded data yields critical gains for cross-lingual NLU transfer, establishing a new paradigm for low-resource language evaluation and modeling.

Technology Category

Application Category

📝 Abstract

Slot-filling and intent detection are well-established tasks in Conversational AI. However, current large-scale benchmarks for these tasks often exclude evaluations of low-resource languages and rely on translations from English benchmarks, thereby predominantly reflecting Western-centric concepts. In this paper, we introduce Injongo -- a multicultural, open-source benchmark dataset for 16 African languages with utterances generated by native speakers across diverse domains, including banking, travel, home, and dining. Through extensive experiments, we benchmark the fine-tuning multilingual transformer models and the prompting large language models (LLMs), and show the advantage of leveraging African-cultural utterances over Western-centric utterances for improving cross-lingual transfer from the English language. Experimental results reveal that current LLMs struggle with the slot-filling task, with GPT-4o achieving an average performance of 26 F1-score. In contrast, intent detection performance is notably better, with an average accuracy of 70.6%, though it still falls behind the fine-tuning baselines. Compared to the English language, GPT-4o and fine-tuning baselines perform similarly on intent detection, achieving an accuracy of approximately 81%. Our findings suggest that the performance of LLMs is still behind for many low-resource African languages, and more work is needed to further improve their downstream performance.

Problem

Research questions and friction points this paper is trying to address.

Multicultural intent detection

Slot-filling for African languages

Cross-lingual transfer improvement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multicultural African language dataset

Fine-tuning multilingual transformer models

Prompting large language models

🔎 Similar Papers

No similar papers found.