GSE: Evaluating Sticker Visual Semantic Similarity via a General Sticker Encoder

📅 2025-11-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Sticker semantic similarity assessment faces challenges including high content diversity, heavy symbolism, and the absence of standardized benchmarks and specialized models. This paper formally defines the sticker semantic similarity task for the first time, introduces Triple-S—the first high-quality, human-annotated triplet benchmark—and proposes the lightweight General Sticker Encoder (GSE). GSE is a Transformer-based architecture trained via multi-stage contrastive learning on Triple-S and additional sticker datasets, enabling robust modeling of symbolic semantics. Experiments demonstrate that GSE yields significantly superior semantic embeddings for unseen stickers compared to general-purpose vision models, achieving state-of-the-art performance on downstream tasks such as emotion classification and cross-domain retrieval. With minimal parameters and efficient inference, GSE serves as a deployable, generalizable foundation model for sticker understanding, accompanied by a rigorous evaluation paradigm.

Technology Category

Application Category

📝 Abstract
Stickers have become a popular form of visual communication, yet understanding their semantic relationships remains challenging due to their highly diverse and symbolic content. In this work, we formally {define the Sticker Semantic Similarity task} and introduce {Triple-S}, the first benchmark for this task, consisting of 905 human-annotated positive and negative sticker pairs. Through extensive evaluation, we show that existing pretrained vision and multimodal models struggle to capture nuanced sticker semantics. To address this, we propose the {General Sticker Encoder (GSE)}, a lightweight and versatile model that learns robust sticker embeddings using both Triple-S and additional datasets. GSE achieves superior performance on unseen stickers, and demonstrates strong results on downstream tasks such as emotion classification and sticker-to-sticker retrieval. By releasing both Triple-S and GSE, we provide standardized evaluation tools and robust embeddings, enabling future research in sticker understanding, retrieval, and multimodal content generation. The Triple-S benchmark and GSE have been publicly released and are available here.
Problem

Research questions and friction points this paper is trying to address.

Defining sticker semantic similarity evaluation task
Creating first benchmark dataset for sticker similarity assessment
Developing lightweight model for robust sticker embeddings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Defines Sticker Semantic Similarity evaluation task
Introduces Triple-S benchmark with human annotations
Proposes lightweight General Sticker Encoder model
H
Heng Er Metilda Chee
DCST, Tsinghua University, Beijing, China; Quan Cheng Laboratory, Jinan, China
Jiayin Wang
Jiayin Wang
Tsinghua University
User ModelingPersonalization
Z
Zhiqiang Guo
DCST, Tsinghua University, Beijing, China
Weizhi Ma
Weizhi Ma
Tsinghua University
LLM and AgentsRecommendationAI for Healthcare
M
Min Zhang
DCST, Tsinghua University, Beijing, China; Quan Cheng Laboratory, Jinan, China