🤖 AI Summary
This work proposes the "Meme Reappraisal" task, which aims to controllably transform the negative sentiment of internet memes into positive, constructive multimodal content while preserving their visual structure and semantic context. To facilitate research in this direction, we introduce MER-Bench, the first multimodal benchmark dataset specifically designed for emotion transformation with structural preservation, featuring fine-grained annotations for both affective states and compositional elements. We further present a generation and evaluation framework grounded in multimodal large language models (MLLMs). By adopting an MLLM-as-a-Judge paradigm for multidimensional automatic assessment, our experiments reveal significant limitations in current approaches regarding structural fidelity, semantic consistency, and affective transformation efficacy, thereby laying the groundwork for controllable meme editing and emotion-aware multimodal generation.
📝 Abstract
Memes represent a tightly coupled, multimodal form of social expression, in which visual context and overlaid text jointly convey nuanced affect and commentary. Inspired by cognitive reappraisal in psychology, we introduce Meme Reappraisal, a novel multimodal generation task that aims to transform negatively framed memes into constructive ones while preserving their underlying scenario, entities, and structural layout. Unlike prior works on meme understanding or generation, Meme Reappraisal requires emotion-controllable, structure-preserving multimodal transformation under multiple semantic and stylistic constraints. To support this task, we construct MER-Bench, a benchmark of real-world memes with fine-grained multimodal annotations, including source and target emotions, positively rewritten meme text, visual editing specifications, and taxonomy labels covering visual type, sentiment polarity, and layout structure. We further propose a structured evaluation framework based on a multimodal large language model (MLLM)-as-a-Judge paradigm, decomposing performance into modality-level generation quality, affect controllability, structural fidelity, and global affective alignment. Extensive experiments across representative image-editing and multimodal-generation systems reveal substantial gaps in satisfying the constraints of structural preservation, semantic consistency, and affective transformation. We believe MER-Bench establishes a foundation for research on controllable meme editing and emotion-aware multimodal generation. Our code is available at: https://github.com/one-seven17/MER-Bench.