All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

📅 2024-11-25
🏛️ arXiv.org
📈 Citations: 8
Influential: 0
📄 PDF
🤖 AI Summary
Contemporary large multimodal models (LMMs) suffer from narrow cultural coverage, weak support for low-resource languages, and insufficient cross-cultural visual–linguistic reasoning capabilities. To address these limitations, we introduce ALM-bench—the first multimodal evaluation benchmark covering 100 languages (including numerous low-resource ones) and 13 cultural dimensions. Methodologically, we propose a hierarchical question-type design (true/false, multiple-choice, open-ended QA), integrate human-annotated multilingual image–text pairs, employ cultural knowledge graphs to guide content sampling, and establish a standardized evaluation protocol enabling multi-granularity assessment. Comprehensive experiments on leading open- and closed-source LMMs systematically expose their significant performance deficits on low-resource language understanding and culture-specific reasoning tasks—revealing these shortcomings for the first time at scale. This work advances the development of globally accessible, culturally inclusive LMMs and provides both a novel paradigm and foundational infrastructure for cross-cultural multimodal understanding research.

Technology Category

Application Category

📝 Abstract
Existing Large Multimodal Models (LMMs) generally focus on only a few regions and languages. As LMMs continue to improve, it is increasingly important to ensure they understand cultural contexts, respect local sensitivities, and support low-resource languages, all while effectively integrating corresponding visual cues. In pursuit of culturally diverse global multimodal models, our proposed All Languages Matter Benchmark (ALM-bench) represents the largest and most comprehensive effort to date for evaluating LMMs across 100 languages. ALM-bench challenges existing models by testing their ability to understand and reason about culturally diverse images paired with text in various languages, including many low-resource languages traditionally underrepresented in LMM research. The benchmark offers a robust and nuanced evaluation framework featuring various question formats, including true/false, multiple choice, and open-ended questions, which are further divided into short and long-answer categories. ALM-bench design ensures a comprehensive assessment of a model's ability to handle varied levels of difficulty in visual and linguistic reasoning. To capture the rich tapestry of global cultures, ALM-bench carefully curates content from 13 distinct cultural aspects, ranging from traditions and rituals to famous personalities and celebrations. Through this, ALM-bench not only provides a rigorous testing ground for state-of-the-art open and closed-source LMMs but also highlights the importance of cultural and linguistic inclusivity, encouraging the development of models that can serve diverse global populations effectively. Our benchmark is publicly available.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LMMs on 100 culturally diverse languages
Assessing LMMs' understanding of cultural contexts and low-resource languages
Testing LMMs' visual and linguistic reasoning across varied cultural aspects
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates LMMs on 100 culturally diverse languages
Integrates visual cues with low-resource language support
Uses diverse question formats for nuanced evaluation
🔎 Similar Papers
No similar papers found.
Ashmal Vayani
Ashmal Vayani
University of Central Florida
Computer VisionMultiModalityLarge Language ModelsResponsible AI
Dinura Dissanayake
Dinura Dissanayake
Research Engineer, MBZUAI
Computer VisionReasoning
H
Hasindri Watawana
Mohamed bin Zayed University of AI
Noor Ahsan
Noor Ahsan
Research Engineer
Remote Sensing Geospatial Data Machine Learning Research Remote Sensing
N
Nevasini Sasikumar
Mohamed bin Zayed University of AI
Omkar Thawakar
Omkar Thawakar
MBZUAI,UAE
Computer VisionMachine LearningGenerative AILLMFoundation Models
Henok Biadglign Ademtew
Henok Biadglign Ademtew
Researcher
Deep LearningMultimodalNLP
Y
Yahya Hmaiti
Amandeep Kumar
Amandeep Kumar
Ph.D. Student, The Johns Hopkins University
Deep LearningComputer VisionPattern Recognition
K
Kartik Kuckreja
Mykola Maslych
Mykola Maslych
CS PhD candidate at ISUE Lab, University of Central Florida
Human-Computer Interaction3DUIGestural InterfacesVirtual RealityApplied Machine Learning
Wafa Al Ghallabi
Wafa Al Ghallabi
PhD student
Computer VisionVLM
M
M. Mihaylov
C
Chao Qin
A
Abdelrahman M. Shaker
Mike Zhang
Mike Zhang
Aalborg University (Copenhagen)
Artificial IntelligenceNatural Language ProcessingInformation ExtractionNLP Applications
Mahardika Krisna Ihsani
Mahardika Krisna Ihsani
MBZUAI
Natural Language ProcessingMachine LearningInterpretabilityComputational Linguistics
A
Amiel Esplana
Monil Gokani
Monil Gokani
Shachar Mirkin
Shachar Mirkin
Harsh Singh
Harsh Singh
A
Ashay Srivastava
Endre Hamerlik
Endre Hamerlik
PhD Student @ Comenius University in Bratislava
neural networksnlpgreen energy
F
Fathinah Asma Izzati
F
F. Maani
S
Sebastian Cavada
Jenny Chim
Jenny Chim
Queen Mary University of London
natural language processingcomputational linguistics
R
Rohit Gupta
S
Sanjay Manjunath
K
Kamila Zhumakhanova
F
F. H. Rabevohitra
A
A. Amirudin
Muhammad Ridzuan
Muhammad Ridzuan
Mohamed bin Zayed University of Artificial Intelligence
AI for HealthcareMachine LearningDeep LearningComputer VisionGeology
D
D. Kareem
Ketan More
Ketan More
MBZUAI
Computer Vision
K
Kunyang Li
P
Pramesh Shakya
Muhammad Saad
Muhammad Saad
X (formerly Twitter)
CybersecuritySystems SecurityFraud Detection
Amirpouya Ghasemaghaei
Amirpouya Ghasemaghaei
CS PhD Student, University of Central Florida
Virtual Reality3D User InterfacesGenerative AILarge Language Models
Amirbek Djanibekov
Amirbek Djanibekov
PhD Student MBZUAI
Natural Language ProcessingSpeech Processing
Dilshod Azizov
Dilshod Azizov
MBZUAI
Machine LearningNLPComputer Vision
Branislava Jankovic
Branislava Jankovic
PhD at MBZUAI
AIComputer VisionMachine LearningComputational BiologyIoT
N
Naman Bhatia
A
Alvaro Cabrera
Johan Obando-Ceron
Johan Obando-Ceron
Mila, University of Montreal
Deep LearningReinforcement LearningMachine LearningArtificial Intelligence
O
Olympiah Otieno
Fabian Farestam
Fabian Farestam
ETH Zürich
games on graphsllm evaluations
M
Muztoba Rabbani
Sanoojan Baliah
Sanoojan Baliah
Research Associate
Visual GenerationDomain generalizationComputer visionMachine learning
Santosh Sanjeev
Santosh Sanjeev
Technology Innovation Institute
MultimodalityVision Language ModelsAI for healthcareGnerative AI
A
A. Shtanchaev
M
Maheen Fatima
T
Thao Nguyen
A
Amrin Kareem
Toluwani Aremu
Toluwani Aremu
MBZUAI
AI SafetyTrustworthy AIResponsible AI
N
Nathan Xavier
A
Amit Bhatkal
H
H. Toyin
Aman Chadha
Aman Chadha
GenAI Leadership @ Apple • Stanford AI • UW-Madison ECE • Ex: Apple, AWS, Alexa, Nvidia
Multimodal AINatural Language ProcessingComputer VisionSpeech ProcessingRecommender Systems
Hisham Cholakkal
Hisham Cholakkal
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
Computer VisionLarge Multimodal ModelsLLMHealthcare Foundation ModelConversational Assistant
R
R. Anwer
Mohamed bin Zayed University of AI, Australian National University
Michael Felsberg
Michael Felsberg
Professor of Computer Vision, Linköping University
Computer VisionMachine LearningRobot Vision
J
J. Laaksonen
Aalto University
T
T. Solorio
Mohamed bin Zayed University of AI
Monojit Choudhury
Monojit Choudhury
Professor of Natural Language Processing, MBZUAI
Natural Language ProcessingLarge Language ModelsEthics of AIComputational Social Science
Ivan Laptev
Ivan Laptev
Professor at MBZUAI, on leave from INRIA
Computer VisionRoboticsAction RecognitionObject Recognition
Mubarak Shah
Mubarak Shah
Trustee Chair Professor of Computer Science, University of Central Florida
Computer Vision
S
Salman Khan
Mohamed bin Zayed University of AI
F
F. Khan
Mohamed bin Zayed University of AI, Linköping University