🤖 AI Summary
To address insufficient utterance discrimination in multi-turn emotion recognition caused by implicit turn modeling, this paper proposes an explicit turn emphasis mechanism. Methodologically, it introduces a dialogue feature priority attention mechanism that leverages turn position and speaker ID to explicitly modulate multi-head self-attention scores—overcoming the limitations of conventional implicit modeling via special tokens—and incorporates turn-level vector representation with a pre-trained language model adaptation architecture. Empirically, the approach achieves state-of-the-art performance across four major benchmarks—IEMOCAP, MELD, EmoryNLP, and DailyDialog—with particularly notable gains on IEMOCAP (a long-dialogue benchmark), where accuracy improves significantly. This work is the first to integrate structured dialogue features directly into the attention mechanism, establishing a novel, interpretable, and scalable paradigm for multi-turn emotion recognition.
📝 Abstract
Emotion recognition in conversation (ERC) has been attracting attention by methods for modeling multi-turn contexts. The multi-turn input to a pretraining model implicitly assumes that the current turn and other turns are distinguished during the training process by inserting special tokens into the input sequence. This paper proposes a priority-based attention method to distinguish each turn explicitly by adding dialogue features into the attention mechanism, called Turn Emphasis with Dialogue (TED). It has a priority for each turn according to turn position and speaker information as dialogue features. It takes multi-head self-attention between turn-based vectors for multi-turn input and adjusts attention scores with the dialogue features. We evaluate TED on four typical benchmarks. The experimental results demonstrate that TED has high overall performance in all datasets and achieves state-of-the-art performance on IEMOCAP with numerous turns.