Meme Similarity and Emotion Detection using Multimodal Analysis

📅 2025-03-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

218K/year
🤖 AI Summary
This study addresses two key challenges in cross-modal meme understanding: (1) the lack of explicit modeling of image-text interactions, and (2) the absence of systematic frameworks for similarity measurement and emotion recognition. To this end, we propose the first framework that explicitly models the synergistic interplay between visual and textual modalities. Our method integrates CLIP-based multimodal embeddings, DistilBERT for text classification, and low-level visual features, enabling fine-grained meme clustering and six-category basic emotion classification on our curated Reddit/Memotion dataset. We innovatively design a CLIP-based cross-modal similarity assessment model and validate its alignment with human perception via a 50-participant user study, achieving 67.23% agreement. Experimental results reveal that anger and joy are the most reliably recognized emotions, and motivational memes elicit significantly stronger emotional responses.

Technology Category

Application Category

📝 Abstract
Internet memes are a central element of online culture, blending images and text. While substantial research has focused on either the visual or textual components of memes, little attention has been given to their interplay. This gap raises a key question: What methodology can effectively compare memes and the emotions they elicit? Our study employs a multimodal methodological approach, analyzing both the visual and textual elements of memes. Specifically, we perform a multimodal CLIP (Contrastive Language-Image Pre-training) model for grouping similar memes based on text and visual content embeddings, enabling robust similarity assessments across modalities. Using the Reddit Meme Dataset and Memotion Dataset, we extract low-level visual features and high-level semantic features to identify similar meme pairs. To validate these automated similarity assessments, we conducted a user study with 50 participants, asking them to provide yes/no responses regarding meme similarity and their emotional reactions. The comparison of experimental results with human judgments showed a 67.23% agreement, suggesting that the computational approach aligns well with human perception. Additionally, we implemented a text-based classifier using the DistilBERT model to categorize memes into one of six basic emotions. The results indicate that anger and joy are the dominant emotions in memes, with motivational memes eliciting stronger emotional responses. This research contributes to the study of multimodal memes, enhancing both language-based and visual approaches to analyzing and improving online visual communication and user experiences. Furthermore, it provides insights for better content moderation strategies in online platforms.
Problem

Research questions and friction points this paper is trying to address.

Develops multimodal method to compare meme similarity and emotions
Analyzes visual-text interplay in memes using CLIP and DistilBERT
Validates computational assessments with human perception agreement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal CLIP model for meme similarity
DistilBERT model for emotion classification
Combines visual and textual meme analysis