CrossCult-KIBench: A Benchmark for Cross-Cultural Knowledge Insertion in MLLMs

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

This work addresses the challenge of culturally inappropriate responses in multimodal large language models (MLLMs), which predominantly rely on English-centric training data and thus struggle in cross-cultural contexts. The study introduces the novel task of “cross-cultural knowledge injection” and presents CrossCult-KIBench, the first fine-grained evaluation benchmark supporting English, Chinese, and Arabic, comprising 49 cultural visual scenarios and 9,800 image-text samples. It further proposes a new evaluation dimension that jointly considers target-culture appropriateness and preservation of non-target cultural behaviors. To tackle this task, the authors develop MCKI, an external memory retrieval method leveraging frozen MLLM representations and conditional prompting to inject culturally aligned image-text knowledge. Experimental results reveal that existing approaches fail to balance cultural adaptation with retention of original model behaviors, highlighting a key challenge in building culturally sensitive MLLMs.

📝 Abstract

Multimodal Large Language Models (MLLMs), trained primarily on English-centric data, frequently generate culturally inappropriate or misaligned responses in cross-cultural settings. To mitigate this, we introduce the task of cross-cultural knowledge insertion, which focuses on adapting models to specific cultural contexts while preserving their original behavior in other cultures. To facilitate research in this area, we introduce CrossCult-KIBench, a comprehensive evaluation benchmark for assessing both the effectiveness of knowledge insertion and its unintended side effects on non-target cultures. The benchmark includes 9,800 image-grounded cases covering 49 culturally relevant visual scenarios across English, Chinese, and Arabic language-culture groups. It supports evaluation in both single-insert and sequential-insert settings. We also propose Memory-Conditioned Knowledge Insertion (MCKI) as a baseline method. MCKI retrieves relevant cultural knowledge from an external memory using frozen MLLM representations, prepending matched entries as conditional prompts when applicable. Extensive experiments on CrossCult-KIBench reveal that current approaches struggle to balance effective cultural adaptation with behavioral preservation, highlighting a key challenge in developing culturally-aware MLLMs. Our work thus underscores an important research direction for developing more culturally adaptive and responsible MLLMs.

Problem

Research questions and friction points this paper is trying to address.

cross-cultural knowledge insertion

Multimodal Large Language Models

cultural adaptation

behavioral preservation

cross-cultural evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-cultural knowledge insertion

Multimodal Large Language Models

CrossCult-KIBench