Emotion Transcription in Conversation: A Benchmark for Capturing Subtle and Complex Emotional States through Natural Language

📅 2026-03-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes Emotion Transcription for Conversation (ETC), a novel task that replaces conventional discrete or dimensional emotion labels with natural language descriptions to more accurately capture the nuanced, complex, and culturally specific emotional states of speakers in dialogues. To support this approach, the authors construct the first multimodal dataset comprising Japanese conversations paired with self-reported emotion narratives. They fine-tune language models to establish baseline performance on this task. Experimental results demonstrate strong capabilities in recognizing explicit emotions, yet highlight ongoing challenges in inferring implicit emotional cues. The released dataset offers a new benchmark for fine-grained, culturally aware emotion modeling in conversational contexts.

Technology Category

Application Category

📝 Abstract
Emotion Recognition in Conversation (ERC) is critical for enabling natural human-machine interactions. However, existing methods predominantly employ categorical or dimensional emotion annotations, which often fail to adequately represent complex, subtle, or culturally specific emotional nuances. To overcome this limitation, we propose a novel task named Emotion Transcription in Conversation (ETC). This task focuses on generating natural language descriptions that accurately reflect speakers'emotional states within conversational contexts. To address the ETC, we constructed a Japanese dataset comprising text-based dialogues annotated with participants'self-reported emotional states, described in natural language. The dataset also includes emotion category labels for each transcription, enabling quantitative analysis and its application to ERC. We benchmarked baseline models, finding that while fine-tuning on our dataset enhances model performance, current models still struggle to infer implicit emotional states. The ETC task will encourage further research into more expressive emotion understanding in dialogue. The dataset is publicly available at https://github.com/UEC-InabaLab/ETCDataset.
Problem

Research questions and friction points this paper is trying to address.

Emotion Recognition in Conversation
Emotion Transcription
Natural Language Description
Emotional Nuance
Conversational Context
Innovation

Methods, ideas, or system contributions that make the work stand out.

Emotion Transcription
Natural Language Description
Emotion Recognition in Conversation
Fine-grained Emotion Modeling
Japanese Dialogue Dataset
🔎 Similar Papers
No similar papers found.
Y
Yoshiki Tanaka
The University of Electro-Communications, Tokyo, Japan
R
Ryuichi Uehara
The University of Electro-Communications, Tokyo, Japan
Koji Inoue
Koji Inoue
Kyoto University
Spoken Dialogue SystemHuman-Robot InteractionTurn-Taking
Michimasa Inaba
Michimasa Inaba
The University of Electro-Communications
Dialogue systemData miningHuman-Computer InteractionHCI