Harm or Humor: A Multimodal, Multilingual Benchmark for Overt and Covert Harmful Humor

📅 2026-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing static benchmarks in detecting harmful humor that relies on cultural context and implicit cues, as well as their inability to distinguish between explicit and implicit harm. To bridge this gap, the authors introduce the first multimodal, multilingual benchmark supporting fine-grained classification into safe, explicitly harmful, and implicitly harmful categories across text, images, and videos. The benchmark spans English, Arabic, and culturally general contexts, with expert human annotations ensuring cultural sensitivity and support for deep reasoning. Systematic evaluation of leading large language models reveals that closed-source models significantly outperform open-source counterparts and exhibit notable performance disparities across languages, underscoring the critical role of cultural awareness and reasoning capabilities in effective safety alignment.

Technology Category

Application Category

📝 Abstract
Dark humor often relies on subtle cultural nuances and implicit cues that require contextual reasoning to interpret, posing safety challenges that current static benchmarks fail to capture. To address this, we introduce a novel multimodal, multilingual benchmark for detecting and understanding harmful and offensive humor. Our manually curated dataset comprises 3,000 texts and 6,000 images in English and Arabic, alongside 1,200 videos that span English, Arabic, and language-independent (universal) contexts. Unlike standard toxicity datasets, we enforce a strict annotation guideline: distinguishing \emph{Safe} jokes from \emph{Harmful} ones, with the latter further classified into \emph{Explicit} (overt) and \emph{Implicit} (Covert) categories to probe deep reasoning. We systematically evaluate state-of-the-art (SOTA) open and closed-source models across all modalities. Our findings reveal that closed-source models significantly outperform open-source ones, with a notable difference in performance between the English and Arabic languages in both, underscoring the critical need for culturally grounded, reasoning-aware safety alignment. \textcolor{red}{Warning: this paper contains example data that may be offensive, harmful, or biased.}
Problem

Research questions and friction points this paper is trying to address.

harmful humor
multimodal
multilingual
implicit toxicity
safety alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal benchmark
harmful humor detection
implicit vs explicit toxicity
multilingual safety evaluation
contextual reasoning
🔎 Similar Papers
No similar papers found.
A
Ahmed Sharshar
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
H
Hosam Elgendy
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
S
Saad El Dine Ahmed
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
Y
Yasser Rohaim
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
Yuxia Wang
Yuxia Wang
MBZUAI
Natural Language Processing