π€ AI Summary
This study addresses the limited cross-cultural fairness and robustness of current vision-language models in detecting hateful memes, stemming from training data that predominantly reflects Western contexts. The authors develop a systematic evaluation framework to analyze the performance of mainstream models on multilingual memes across three dimensions: learning strategies, prompt language, and translation effects, uncovering significant performance degradation when detection follows machine translation. To mitigate the modelsβ systematic bias toward Western safety norms, they propose a culturally aligned intervention combining native-language prompting with one-shot learning. Experimental results demonstrate that this approach substantially improves cross-cultural detection accuracy, offering a practical pathway toward globally robust multimodal content moderation systems.
π Abstract
Cultural context profoundly shapes how people interpret online content, yet vision-language models (VLMs) remain predominantly trained through Western or English-centric lenses. This limits their fairness and cross-cultural robustness in tasks like hateful meme detection. We introduce a systematic evaluation framework designed to diagnose and quantify the cross-cultural robustness of state-of-the-art VLMs across multilingual meme datasets, analyzing three axes: (i) learning strategy (zero-shot vs. one-shot), (ii) prompting language (native vs. English), and (iii) translation effects on meaning and detection. Results show that the common ``translate-then-detect''approach deteriorate performance, while culturally aligned interventions - native-language prompting and one-shot learning - significantly enhance detection. Our findings reveal systematic convergence toward Western safety norms and provide actionable strategies to mitigate such bias, guiding the design of globally robust multimodal moderation systems.