EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Spoken Dialogue Systems

📅 2025-08-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current spoken dialogue systems lack systematic evaluation of affective reasoning capabilities—particularly cross-turn emotional coherence. To address this gap, we propose the first benchmark framework specifically designed for assessing emotional coherence in speech-based dialogues. Our method innovatively introduces a cross-turn affective reasoning scoring mechanism and leverages text-to-speech (TTS) synthesis to generate diverse, high-fidelity spoken evaluation data spanning multiple emotion categories and intensity levels. The framework integrates three complementary metric types: continuous (e.g., emotion intensity trajectory), categorical (e.g., polarity consistency), and perceptual (i.e., human subjective judgments), enabling multidimensional, reproducible assessment. Extensive experiments across seven state-of-the-art dialogue systems reveal prevalent patterns of emotional inconsistency, demonstrating the framework’s effectiveness and generalizability in detecting and quantifying emotional coherence deficits.

Technology Category

Application Category

📝 Abstract
Speech emotions play a crucial role in human-computer interaction, shaping engagement and context-aware communication. Despite recent advances in spoken dialogue systems, a holistic system for evaluating emotional reasoning is still lacking. To address this, we introduce EMO-Reasoning, a benchmark for assessing emotional coherence in dialogue systems. It leverages a curated dataset generated via text-to-speech to simulate diverse emotional states, overcoming the scarcity of emotional speech data. We further propose the Cross-turn Emotion Reasoning Score to assess the emotion transitions in multi-turn dialogues. Evaluating seven dialogue systems through continuous, categorical, and perceptual metrics, we show that our framework effectively detects emotional inconsistencies, providing insights for improving current dialogue systems. By releasing a systematic evaluation benchmark, we aim to advance emotion-aware spoken dialogue modeling toward more natural and adaptive interactions.
Problem

Research questions and friction points this paper is trying to address.

Lacking holistic evaluation for emotional reasoning in dialogue systems
Addressing scarcity of emotional speech data for benchmarking
Assessing emotional coherence and transitions in multi-turn dialogues
Innovation

Methods, ideas, or system contributions that make the work stand out.

Text-to-speech generated emotional dataset
Cross-turn Emotion Reasoning Score metric
Multi-metric emotional coherence evaluation framework
🔎 Similar Papers
No similar papers found.