Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue

📅 2024-02-06
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the task of sarcasm explanation generation in multimodal dialogue, focusing on modeling fine-grained emotional contrasts across textual, visual, and acoustic modalities to enhance sarcasm interpretability. We propose an emotion-enhanced context–emotion heterogeneous graph structure, introducing (i) a lexicon-guided textual emotion reasoning module and (ii) a novel Joint Cross-modal Attention-based Sarcasm Inference (JCA-SI) module to uniformly capture heterogeneous modality-wise emotional relationships. Our framework integrates the BART generative architecture, lexicon-augmented sentiment analysis, and an extended JCA cross-modal attention mechanism. Evaluated on the WITS dataset, our method significantly outperforms state-of-the-art approaches. Both automatic metrics and human evaluation confirm that the proposed emotion graph modeling substantially improves the accuracy, consistency, and comprehensibility of generated sarcasm explanations—establishing a new paradigm for multimodal sarcasm understanding.

Technology Category

Application Category

📝 Abstract
Sarcasm Explanation in Dialogue (SED) is a new yet challenging task, which aims to generate a natural language explanation for the given sarcastic dialogue that involves multiple modalities (ie utterance, video, and audio). Although existing studies have achieved great success based on the generative pretrained language model BART, they overlook exploiting the sentiments residing in the utterance, video and audio, which play important roles in reflecting sarcasm that essentially involves subtle sentiment contrasts. Nevertheless, it is non-trivial to incorporate sentiments for boosting SED performance, due to three main challenges: 1) diverse effects of utterance tokens on sentiments; 2) gap between video-audio sentiment signals and the embedding space of BART; and 3) various relations among utterances, utterance sentiments, and video-audio sentiments. To tackle these challenges, we propose a novel sEntiment-enhanceD Graph-based multimodal sarcasm Explanation framework, named EDGE. In particular, we first propose a lexicon-guided utterance sentiment inference module, where a heuristic utterance sentiment refinement strategy is devised. We then develop a module named Joint Cross Attention-based Sentiment Inference (JCA-SI) by extending the multimodal sentiment analysis model JCA to derive the joint sentiment label for each video-audio clip. Thereafter, we devise a context-sentiment graph to comprehensively model the semantic relations among the utterances, utterance sentiments, and video-audio sentiments, to facilitate sarcasm explanation generation. Extensive experiments on the publicly released dataset WITS verify the superiority of our model over cutting-edge methods.
Problem

Research questions and friction points this paper is trying to address.

Multimodal Emotion Recognition
Sarcasm Detection
Subtle Emotional Contrast
Innovation

Methods, ideas, or system contributions that make the work stand out.

Emotion-enhanced Model
JCA-SI Module
Situation-Emotion Graph
🔎 Similar Papers
No similar papers found.
Liqiang Jing
Liqiang Jing
University of Texas at Dallas
Multimedia AnalysisMultimodalNatural Language Processing
Xuemeng Song
Xuemeng Song
City University of Hong Kong
Information RetrievalMultimedia Analysis
M
Meng Liu
School of Computer Science and Technology, Shandong Jianzhu University, Jinan 250101, China
Y
Yupeng Hu
School of Software, Shandong University, Jinan 250101, China
L
Liqiang Nie
School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China