Reply with Sticker: New Dataset and Model for Sticker Retrieval

📅 2024-03-08
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Existing sticker retrieval methods predominantly rely on contextual generation, overlooking stickers’ capacity as independent semantic units—e.g., direct responses or semantic completions. To address this, we introduce StickerInt, the first open-domain conversational sticker retrieval dataset supporting both “sticker-as-reply” and “sticker-as-semantic-completion” paradigms. We formally define and model the novel task of “sticker as standalone reply” for the first time. Furthermore, we propose Int-RA, a knowledge-enhanced intent prediction and relation-aware cross-modal selection model that jointly encodes dialogue intent and multimodal (text-image) semantics. On StickerInt, Int-RA significantly outperforms state-of-the-art methods. We publicly release the dataset and code to advance sticker retrieval toward more natural, holistic human–machine interaction.

Technology Category

Application Category

📝 Abstract
Using stickers in online chatting is very prevalent on social media platforms, where the stickers used in the conversation can express someone's intention/emotion/attitude in a vivid, tactful, and intuitive way. Existing sticker retrieval research typically retrieves stickers based on context and the current utterance delivered by the user. That is, the stickers serve as a supplement to the current utterance. However, in the real-world scenario, using stickers to express what we want to say rather than as a supplement to our words only is also important. Therefore, in this paper, we create a new dataset for sticker retrieval in conversation, called extbf{StickerInt}, where stickers are used to reply to previous conversations or supplement our wordsfootnote{We believe that the release of this dataset will provide a more complete paradigm than existing work for the research of sticker retrieval in the open-domain online conversation.}. Based on the created dataset, we present a simple yet effective framework for sticker retrieval in conversation based on the learning of intention and the cross-modal relationships between conversation context and stickers, coined as extbf{Int-RA}. Specifically, we first devise a knowledge-enhanced intention predictor to introduce the intention information into the conversation representations. Subsequently, a relation-aware sticker selector is devised to retrieve the response sticker via cross-modal relationships. Extensive experiments on the created dataset show that the proposed model achieves state-of-the-art performance in sticker retrievalfootnote{The dataset and source code of this work are released at url{https://github.com/HITSZ-HLT/Int-RA}.}.
Problem

Research questions and friction points this paper is trying to address.

Sticker Retrieval
Independent Expression
Conversation Context
Innovation

Methods, ideas, or system contributions that make the work stand out.

StickerInt dataset
Int-RA framework
cross-modal relationships
🔎 Similar Papers
No similar papers found.
B
Bin Liang
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong, China
Bingbing Wang
Bingbing Wang
Harbin Institute of Technology, Shenzhen
natural language processing
Zhixin Bai
Zhixin Bai
Harbin Institute of Technology
natural language processing
Qiwei Lang
Qiwei Lang
School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China
M
Mingwei Sun
School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China
K
Kaiheng Hou
School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China
K
Kam-Fai Wong
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong, China
Ruifeng Xu
Ruifeng Xu
Professor, Harbin Institute of Technology at Shenzhen
Natural Language ProcessingAffective ComputingArgumentation MiningLLMsBioinformatics