Can Thinking Models Think to Detect Hateful Memes?

📅 2026-03-01

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Detecting hateful memes requires precise understanding of the implicit harmful intent conveyed through the synergistic interplay of text and image, posing a significant challenge for fine-grained multimodal reasoning. This work proposes a reinforcement learning–based post-training framework that jointly optimizes both classification performance and explanation generation quality under weak supervision, integrating chain-of-thought distillation with a newly designed Group Relative Policy Optimization (GRPO) objective. Evaluated on the Hateful Memes benchmark, the method achieves state-of-the-art results, improving accuracy and F1 score by approximately 1% and enhancing explanation quality by about 3%. These gains substantially advance the interpretability and robustness of multimodal large language models in detecting hate content.

Technology Category

Application Category

📝 Abstract

Hateful memes often require compositional multimodal reasoning: the image and text may appear benign in isolation, yet their interaction conveys harmful intent. Although thinking-based multimodal large language models (MLLMs) have recently advanced vision-language understanding, their capabilities remain underexplored for hateful meme analysis. We propose a reinforcement learning based post-training framework that improves reasoning in thinking-based MLLMs through task-specific rewards and a novel Group Relative Policy Optimization (GRPO) objective. Specifically, we (i) conduct a systematic empirical study of off-the-shelf MLLMs for hateful meme understanding, (ii) extend an existing hateful meme dataset by generating weakly or pseudo-supervised chain-of-thought rationales via distillation, and (iii) introduce a GRPO-based objective that jointly optimizes meme classification and explanation quality to encourage fine-grained, step-by-step reasoning. Experiments on the Hateful Memes benchmark show that our approach achieves state-of-the-art performance, improving accuracy and F1 by approximately 1 percent and explanation quality by approximately 3 percent. We will publicly release our code, dataset extensions, and evaluation resources to support reproducibility.

Problem

Research questions and friction points this paper is trying to address.

hateful memes

multimodal reasoning

multimodal large language models

compositional reasoning

vision-language understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal reasoning

reinforcement learning

chain-of-thought