Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Existing large vision-language models (LVLMs) exhibit pervasive cultural biases, particularly lacking robust Arabic cultural grounding due to the absence of high-quality, culturally rich Arabic multimodal data. Method: We introduce Pearl—the first large-scale, culture-aware Arabic multimodal instruction dataset—covering all 22 Arab states and ten major cultural domains. We propose a novel fine-grained Arabic cultural annotation framework; design the Pearl-X subset to quantify regional cultural variability; and employ agent-driven data generation, cross-regional collaborative annotation by 45 annotators, and a three-tier benchmark suite (Pearl, Pearl-Lite, Pearl-X). Contribution/Results: Empirical analysis demonstrates that instruction alignment significantly improves cultural grounding more than model scaling alone. Comprehensive evaluation across leading open- and closed-source multimodal LLMs shows substantial gains in cultural understanding and reasoning. All data, annotations, and benchmarks are publicly released.

Technology Category

Application Category

📝 Abstract

Mainstream large vision-language models (LVLMs) inherently encode cultural biases, highlighting the need for diverse multimodal datasets. To address this gap, we introduce Pearl, a large-scale Arabic multimodal dataset and benchmark explicitly designed for cultural understanding. Constructed through advanced agentic workflows and extensive human-in-the-loop annotations by 45 annotators from across the Arab world, Pearl comprises over K multimodal examples spanning ten culturally significant domains covering all Arab countries. We further provide two robust evaluation benchmarks Pearl and Pearl-Lite along with a specialized subset Pearl-X explicitly developed to assess nuanced cultural variations. Comprehensive evaluations on state-of-the-art open and proprietary LVLMs demonstrate that reasoning-centric instruction alignment substantially improves models' cultural grounding compared to conventional scaling methods. Pearl establishes a foundational resource for advancing culturally-informed multimodal modeling research. All datasets and benchmarks are publicly available.

Problem

Research questions and friction points this paper is trying to address.

Addressing cultural biases in large vision-language models

Introducing a culturally-aware Arabic multimodal dataset

Improving cultural grounding via reasoning-centric instruction alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale Arabic multimodal dataset Pearl

Agentic workflows and human annotations

Reasoning-centric instruction alignment improves models

🔎 Similar Papers

No similar papers found.