Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large vision-language models (LVLMs) exhibit pervasive cultural biases, particularly lacking robust Arabic cultural grounding due to the absence of high-quality, culturally rich Arabic multimodal data. Method: We introduce Pearl—the first large-scale, culture-aware Arabic multimodal instruction dataset—covering all 22 Arab states and ten major cultural domains. We propose a novel fine-grained Arabic cultural annotation framework; design the Pearl-X subset to quantify regional cultural variability; and employ agent-driven data generation, cross-regional collaborative annotation by 45 annotators, and a three-tier benchmark suite (Pearl, Pearl-Lite, Pearl-X). Contribution/Results: Empirical analysis demonstrates that instruction alignment significantly improves cultural grounding more than model scaling alone. Comprehensive evaluation across leading open- and closed-source multimodal LLMs shows substantial gains in cultural understanding and reasoning. All data, annotations, and benchmarks are publicly released.

Technology Category

Application Category

📝 Abstract
Mainstream large vision-language models (LVLMs) inherently encode cultural biases, highlighting the need for diverse multimodal datasets. To address this gap, we introduce Pearl, a large-scale Arabic multimodal dataset and benchmark explicitly designed for cultural understanding. Constructed through advanced agentic workflows and extensive human-in-the-loop annotations by 45 annotators from across the Arab world, Pearl comprises over K multimodal examples spanning ten culturally significant domains covering all Arab countries. We further provide two robust evaluation benchmarks Pearl and Pearl-Lite along with a specialized subset Pearl-X explicitly developed to assess nuanced cultural variations. Comprehensive evaluations on state-of-the-art open and proprietary LVLMs demonstrate that reasoning-centric instruction alignment substantially improves models' cultural grounding compared to conventional scaling methods. Pearl establishes a foundational resource for advancing culturally-informed multimodal modeling research. All datasets and benchmarks are publicly available.
Problem

Research questions and friction points this paper is trying to address.

Addressing cultural biases in large vision-language models
Introducing a culturally-aware Arabic multimodal dataset
Improving cultural grounding via reasoning-centric instruction alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale Arabic multimodal dataset Pearl
Agentic workflows and human annotations
Reasoning-centric instruction alignment improves models
🔎 Similar Papers
No similar papers found.
Fakhraddin Alwajih
Fakhraddin Alwajih
Postdoctoral Fellow Researcher @ UBC
Artificial Intelligence Machine Learning Natural Language Processing
Samar M. Magdy
Samar M. Magdy
The University of British Columbia
LinguisticsComputational LinguisticsNLP
A
Abdellah El Mekki
The University of British Columbia
O
Omer Nacar
Prince Sultan University
Youssef Nafea
Youssef Nafea
Masters Student at MBZUAI
Deep LearningLLMsNatural Language ProcessingSpeech Processing
S
Safaa Taher Abdelfadil
Cairo University
A
Abdulfattah Mohammed Yahya
KFUPM
Hamzah Luqman
Hamzah Luqman
Associate Professor, King Fahd university for Petroleum and Minerals (KFUPM)
Computer VisionArabic Natural Language Processing
Nada Almarwani
Nada Almarwani
Assistant Professor
NLPAI
Samah Aloufi
Samah Aloufi
Assistant Professor, Taibah University
AISocial multimedia mining and analysismultimedia retrieval and summarizationMultimedia recommendationMachine learning
B
Baraah Qawasmeh
WMU
H
Houdaifa Atou
UM6P
Serry Sibaee
Serry Sibaee
Research Engineer
Arabic Natural Language processingNLP
H
Hamzah A. Alsayadi
Ain Shams University
W
Walid Al-Dhabyani
Cairo University
M
Maged S. Al-shaibani
KFUPM
A
Aya El Aatar
UCA
N
Nour Qandos
Technology, Information and Internet(Qafza)
R
Rahaf Alhamouri
Birzeit University
S
Samar Ahmad
KAUST
R
Razan Khassib
Birzeit University
L
Lina Hamad
Birzeit University
M
Mohammed Anwar Al-Ghrawi
Damascus University
F
Fatimah Alshamari
Taibah University
C
Cheikh Malainine
University of Nouakchott
D
Doaa Qawasmeh
BAU
A
Aminetou Yacoub
University of Nouakchott
T
Tfeil Moilid
University of Nouakchott
R
Ruwa AbuHweidi
Birzeit University
Ahmed Aboeitta
Ahmed Aboeitta
Master's Student
V
Vatimetou Mohamed Lemin
University of Nouakchott
Reem Abdel-Salam
Reem Abdel-Salam
MSc student at Faculty of Engineering, Computer Department, Cairo University
Deep learningComputer VisionImage Processing
A
Ahlam Bashiti
Birzeit University
Adel Ammar
Adel Ammar
Full Professor, Senior researcher at RIOTU Lab, Prince Sultan University, Riyadh.
Artificial IntelligenceMachine LearningDeep LearningLLMsNLP
Aisha Alansari
Aisha Alansari
Graduate Assistant, Information and Computer Science Department, KFUPM
Machine LearningNatural Language ProcessingDeep LearningLLMs
A
Ahmed Ashraf
KFUPM
N
Nora Alturayeif
Imam Abdulrahman Bin Faisal University
S
Sara Shatnawi
BAU
Alcides Alcoba Inciarte
Alcides Alcoba Inciarte
Research Assistant, The University of British Columbia
Natural Language Processing
A
Abdelrahim A. Elmadany
The University of British Columbia
M
Mohamedou Cheikh Tourad
University of Nouakchott
I
Ismail Berrada
UM6P
Mustafa Jarrar
Mustafa Jarrar
Professor, Hamad Bin Khalifa University, Qatar - Birzeit University, Palestine
Arabic Natural Language ProcessingSocial ComputingOntology EngineeringKnowledge Graphs
Shady Shehata
Shady Shehata
University of Waterloo
Artificial IntelligenceNatural Language Processing
Muhammad Abdul-Mageed
Muhammad Abdul-Mageed
The University of British Columbia
Natural Language ProcessingDeep Learning