Detecting Harmful Memes with Decoupled Understanding and Guided CoT Reasoning

📅 2025-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational cost, poor generalizability, and limited interpretability in multimodal harmful meme detection, this paper proposes a decoupled understanding framework. First, high-fidelity image-to-text translation generates semantically consistent textual descriptions; subsequently, these are jointly reasoned with manually curated, structured harm guidelines via zero-shot chain-of-thought inference in a lightweight large language model (LLM). The method thus balances efficiency, cross-domain adaptability, and decision transparency. Evaluated on seven benchmark datasets, it substantially outperforms state-of-the-art approaches. It enables cross-platform, cross-regional, and cross-temporal transfer under low-resource conditions without fine-tuning, marking the first interpretable meme safety detector that is guideline-guided, lightweight, and deployable without parameter adaptation.

Technology Category

Application Category

📝 Abstract
Detecting harmful memes is essential for maintaining the integrity of online environments. However, current approaches often struggle with resource efficiency, flexibility, or explainability, limiting their practical deployment in content moderation systems. To address these challenges, we introduce U-CoT+, a novel framework for harmful meme detection. Instead of relying solely on prompting or fine-tuning multimodal models, we first develop a high-fidelity meme-to-text pipeline that converts visual memes into detail-preserving textual descriptions. This design decouples meme interpretation from meme classification, thus avoiding immediate reasoning over complex raw visual content and enabling resource-efficient harmful meme detection with general large language models (LLMs). Building on these textual descriptions, we further incorporate targeted, interpretable human-crafted guidelines to guide models' reasoning under zero-shot CoT prompting. As such, this framework allows for easy adaptation to different harmfulness detection criteria across platforms, regions, and over time, offering high flexibility and explainability. Extensive experiments on seven benchmark datasets validate the effectiveness of our framework, highlighting its potential for explainable and low-resource harmful meme detection using small-scale LLMs. Codes and data are available at: https://anonymous.4open.science/r/HMC-AF2B/README.md.
Problem

Research questions and friction points this paper is trying to address.

Detecting harmful memes efficiently and flexibly
Improving explainability in harmful meme detection
Enabling low-resource detection with general LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples meme interpretation from classification
Uses high-fidelity meme-to-text pipeline
Incorporates human-crafted guidelines for reasoning
🔎 Similar Papers
No similar papers found.
F
Fengjun Pan
Nanyang Technological University
A
A. Luu
Nanyang Technological University
Xiaobao Wu
Xiaobao Wu
Research Scientist, Nanyang Technological University
Large Language ModelsMachine LearningNatural Language Processing