MemeMind: A Large-Scale Multimodal Dataset with Chain-of-Thought Reasoning for Harmful Meme Detection

📅 2025-06-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Automated detection of harmful memes—characterized by multimodal fusion (image-text) and semantically obfuscated content—remains hampered by complex cross-modal interactions and a scarcity of high-quality, benchmark datasets. To address this, we introduce MemeMind, the first large-scale, multilingual (Chinese-English), highly diverse multimodal dataset explicitly annotated with chain-of-thought (CoT) reasoning traces for harmful meme detection. Crucially, we pioneer an annotation paradigm that explicitly models interpretable, stepwise reasoning paths as integral labels. Building upon this, we propose MemeGuard, a novel detection framework that jointly leverages multimodal representation learning and CoT-guided deep reasoning. Extensive experiments demonstrate that MemeGuard significantly outperforms state-of-the-art methods on MemeMind, achieving substantial accuracy gains while providing end-to-end interpretable decision rationales. This work establishes both a new benchmark and a principled, reasoning-aware paradigm for robust and transparent harmful meme detection.

Technology Category

Application Category

📝 Abstract
The rapid development of social media has intensified the spread of harmful content. Harmful memes, which integrate both images and text, pose significant challenges for automated detection due to their implicit semantics and complex multimodal interactions. Although existing research has made progress in detection accuracy and interpretability, the lack of a systematic, large-scale, diverse, and highly explainable dataset continues to hinder further advancement in this field. To address this gap, we introduce MemeMind, a novel dataset featuring scientifically rigorous standards, large scale, diversity, bilingual support (Chinese and English), and detailed Chain-of-Thought (CoT) annotations. MemeMind fills critical gaps in current datasets by offering comprehensive labeling and explicit reasoning traces, thereby providing a solid foundation for enhancing harmful meme detection. In addition, we propose an innovative detection framework, MemeGuard, which effectively integrates multimodal information with reasoning process modeling, significantly improving models' ability to understand and identify harmful memes. Extensive experiments conducted on the MemeMind dataset demonstrate that MemeGuard consistently outperforms existing state-of-the-art methods in harmful meme detection tasks.
Problem

Research questions and friction points this paper is trying to address.

Detecting harmful memes with implicit semantics and multimodal interactions
Lacking large-scale diverse datasets for explainable harmful meme detection
Improving model accuracy and interpretability in harmful meme identification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale multimodal dataset with CoT reasoning
Bilingual support with detailed annotations
MemeGuard integrates multimodal and reasoning modeling
🔎 Similar Papers
No similar papers found.
H
Hexiang Gu
Beijing University of Posts and Telecommunications, Beijing, China
Qifan Yu
Qifan Yu
Zhejiang University
MLLMmultimodal learningimage generation & editing
Saihui Hou
Saihui Hou
Beijing Normal University
Deep LearningComputer VisionMultimodal Large Language Models
Zhiqin Fang
Zhiqin Fang
北京邮电大学
MLLMs
H
Huijia Wu
Beijing University of Posts and Telecommunications, Beijing, China
Z
Zhaofeng He
Beijing University of Posts and Telecommunications, Beijing, China