🤖 AI Summary
This study addresses the pervasive lack of cultural competence in contemporary multilingual NLP models, which often fail to accurately interpret expressions deeply rooted in specific cultural contexts despite broad language coverage. Synthesizing insights from over 50 studies published between 2020 and 2026, the work advocates a paradigm shift from isolated language processing toward modeling the “communicative ecology,” integrating institutional norms, cultural scripts, and community practices as essential contextual dimensions. Through culturally aware evaluation benchmarks (e.g., Global-MMLU, CulturalBench), multimodal grounding of local knowledge, community-coconstructed datasets, and cultural alignment techniques, the research demonstrates that insufficient training data coverage is not the sole bottleneck—language choice, tokenization strategies, and translation benchmark design are equally critical. The paper calls for layered cultural evaluation frameworks and participatory alignment approaches to advance fair, inclusive, and culturally grounded NLP systems.
📝 Abstract
Recent progress in multilingual NLP is often taken as evidence of broader global inclusivity, but a growing literature shows that multilingual capability and cultural competence come apart. This paper synthesizes over 50 papers from 2020--2026 spanning multilingual performance inequality, cross-lingual transfer, culture-aware evaluation, cultural alignment, multimodal local-knowledge modeling, benchmark design critiques, and community-grounded data practices. Across this literature, training data coverage remains a strong determinant of performance, yet it is not sufficient: tokenization, prompt language, translated benchmark design, culturally specific supervision, and multimodal context all materially affect outcomes. Recent work on Global-MMLU, CDEval, WorldValuesBench, CulturalBench, CULEMO, CulturalVQA, GIMMICK, DRISHTIKON, WorldCuisines, CARE, CLCA, and newer critiques of benchmark design and community-grounded evaluation shows that strong multilingual models can still flatten local norms, misread culturally grounded cues, and underperform in lower-resource or community-specific settings. We argue that the field should move from treating languages as isolated rows in a benchmark spreadsheet toward modeling communicative ecologies: the institutions, scripts, translation pipelines, domains, modalities, and communities through which language is used. On that basis, we propose a research agenda for culturally grounded NLP centered on richer contextual metadata, culturally stratified evaluation, participatory alignment, within-language variation, and multimodal community-aware design.