MemeReaCon: Probing Contextual Meme Understanding in Large Vision-Language Models

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current large vision-language models (LVLMs) exhibit limited capability in understanding context-dependent meme intent, often disregarding conversational cues or overemphasizing visual details—leading to failures in communicative purpose identification. To address this, we introduce MemeContext, the first context-aware benchmark for meme understanding, specifically designed to evaluate how meme intent dynamically shifts across distinct Reddit discussion contexts. We construct a five-source, real-world multimodal dataset comprising original posts, threaded comments, and community engagement signals, and formally define and quantify the “context-dependent meme understanding” task. Our annotation schema includes fine-grained, three-dimensional labels: intent, structural composition, and response alignment. Leveraging human-in-the-loop annotation and a context-sensitive evaluation protocol—spanning intent recognition, image-text alignment, and community-aware modeling—we expose systematic deficiencies in LVLMs’ cross-modal contextual integration. MemeContext provides a reproducible, quantitative foundation for diagnosing meme comprehension capabilities and guiding model improvement.

Technology Category

Application Category

📝 Abstract
Memes have emerged as a popular form of multimodal online communication, where their interpretation heavily depends on the specific context in which they appear. Current approaches predominantly focus on isolated meme analysis, either for harmful content detection or standalone interpretation, overlooking a fundamental challenge: the same meme can express different intents depending on its conversational context. This oversight creates an evaluation gap: although humans intuitively recognize how context shapes meme interpretation, Large Vision Language Models (LVLMs) can hardly understand context-dependent meme intent. To address this critical limitation, we introduce MemeReaCon, a novel benchmark specifically designed to evaluate how LVLMs understand memes in their original context. We collected memes from five different Reddit communities, keeping each meme's image, the post text, and user comments together. We carefully labeled how the text and meme work together, what the poster intended, how the meme is structured, and how the community responded. Our tests with leading LVLMs show a clear weakness: models either fail to interpret critical information in the contexts, or overly focus on visual details while overlooking communicative purpose. MemeReaCon thus serves both as a diagnostic tool exposing current limitations and as a challenging benchmark to drive development toward more sophisticated LVLMs of the context-aware understanding.
Problem

Research questions and friction points this paper is trying to address.

Evaluating contextual meme understanding in large vision-language models
Assessing how meme interpretation changes based on conversational context
Testing LVLMs' ability to grasp context-dependent meme intent
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contextual meme benchmark for LVLMs
Reddit multimodal data collection
Diagnostic tool for context-aware evaluation
🔎 Similar Papers
No similar papers found.
Zhengyi Zhao
Zhengyi Zhao
The Chinese University of Hong Kong
Natural Language ProcessMachine LearningInformation Extraction
S
Shubo Zhang
University of International Relations
Yuxi Zhang
Yuxi Zhang
University of Illinois, Urbana-Champaign
condensed matter physics
Y
Yanxi Zhao
University of International Relations
Y
Yifan Zhang
University of International Relations
Zezhong Wang
Zezhong Wang
Institute of Science Tokyo
VLSI physical design
H
Huimin Wang
Jarvis Research Center, Tencent YouTu Lab
Y
Yutian Zhao
Jarvis Research Center, Tencent YouTu Lab
B
Bin Liang
The Chinese University of Hong Kong
Yefeng Zheng
Yefeng Zheng
Professor, Westlake University, Hangzhou, China, IEEE Fellow, AIMBE Fellow
AI in HealthMedical ImagingComputer VisionNatural Language ProcessingLarge Language Model
B
Binyang Li
University of International Relations
K
Kam-Fai Wong
The Chinese University of Hong Kong
X
Xian Wu
Jarvis Research Center, Tencent YouTu Lab