🤖 AI Summary
Meme clustering on the Internet faces challenges in modeling semantic similarity due to multimodality, cultural dependency, and dynamic evolution; existing approaches largely rely on predefined databases, neglecting semantic diversity. This paper proposes the first database-free, template-matching–based adaptive clustering framework for memes. It jointly models four-dimensional semantic similarities—structural, visual, textual, and identity-related—and integrates local-global similarity metrics with unsupervised clustering. Crucially, the method eliminates dependence on annotated data or fixed template repositories, enabling semantically coherent, cross-cultural, and cross-platform meme clustering. Extensive experiments on multiple benchmark datasets demonstrate significant improvements over state-of-the-art methods, yielding clusters that are more cohesive, interpretable, and aligned with human intuition. The complete implementation is publicly available.
📝 Abstract
Meme clustering is critical for toxicity detection, virality modeling, and typing, but it has received little attention in previous research. Clustering similar Internet memes is challenging due to their multimodality, cultural context, and adaptability. Existing approaches rely on databases, overlook semantics, and struggle to handle diverse dimensions of similarity. This paper introduces a novel method that uses template-based matching with multi-dimensional similarity features, thus eliminating the need for predefined databases and supporting adaptive matching. Memes are clustered using local and global features across similarity categories such as form, visual content, text, and identity. Our combined approach outperforms existing clustering methods, producing more consistent and coherent clusters, while similarity-based feature sets enable adaptability and align with human intuition. We make all supporting code publicly available to support subsequent research. Code: https://github.com/tygobl/meme-clustering