Can Thinking Models Think to Detect Hateful Memes?

๐Ÿ“… 2026-03-01
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Detecting hateful memes requires precise understanding of the implicit harmful intent conveyed through the synergistic interplay of text and image, posing a significant challenge for fine-grained multimodal reasoning. This work proposes a reinforcement learningโ€“based post-training framework that jointly optimizes both classification performance and explanation generation quality under weak supervision, integrating chain-of-thought distillation with a newly designed Group Relative Policy Optimization (GRPO) objective. Evaluated on the Hateful Memes benchmark, the method achieves state-of-the-art results, improving accuracy and F1 score by approximately 1% and enhancing explanation quality by about 3%. These gains substantially advance the interpretability and robustness of multimodal large language models in detecting hate content.

Technology Category

Application Category

๐Ÿ“ Abstract
Hateful memes often require compositional multimodal reasoning: the image and text may appear benign in isolation, yet their interaction conveys harmful intent. Although thinking-based multimodal large language models (MLLMs) have recently advanced vision-language understanding, their capabilities remain underexplored for hateful meme analysis. We propose a reinforcement learning based post-training framework that improves reasoning in thinking-based MLLMs through task-specific rewards and a novel Group Relative Policy Optimization (GRPO) objective. Specifically, we (i) conduct a systematic empirical study of off-the-shelf MLLMs for hateful meme understanding, (ii) extend an existing hateful meme dataset by generating weakly or pseudo-supervised chain-of-thought rationales via distillation, and (iii) introduce a GRPO-based objective that jointly optimizes meme classification and explanation quality to encourage fine-grained, step-by-step reasoning. Experiments on the Hateful Memes benchmark show that our approach achieves state-of-the-art performance, improving accuracy and F1 by approximately 1 percent and explanation quality by approximately 3 percent. We will publicly release our code, dataset extensions, and evaluation resources to support reproducibility.
Problem

Research questions and friction points this paper is trying to address.

hateful memes
multimodal reasoning
multimodal large language models
compositional reasoning
vision-language understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal reasoning
reinforcement learning
chain-of-thought
Group Relative Policy Optimization
hateful meme detection
๐Ÿ”Ž Similar Papers
No similar papers found.
M
Mohamed Bayan Kmainasi
Qatar University
Mucahid Kutlu
Mucahid Kutlu
Assistant Professor, Qatar University
Information RetrievalNatural Language Processing
A
Ali Ezzat Shahroor
Qatar Computing Research Institute
A
Abul Hasnat
APAVI.AI; Blackbird.AI
F
Firoj Alam
Qatar Computing Research Institute