EmoNews: A Spoken Dialogue System for Expressive News Conversations

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Task-oriented spoken dialogue systems (SDS) have long neglected speech emotion modeling, primarily due to the disciplinary separation between SDS and expressive text-to-speech (TTS) research, as well as the absence of empathy-oriented evaluation metrics. This paper introduces the first empathetic spoken dialogue system tailored to news scenarios, enabling expressive, context-aware emotional modulation in spoken interaction. Methodologically, we propose the first deep end-to-end integration of emotional TTS and task-oriented SDS, unifying an LLM-driven emotion analyzer, PromptTTS-based speech synthesis, and a coherent dialogue management framework. We further introduce the first subjective evaluation scale specifically designed for emotional SDS. Experimental results demonstrate significant improvements over baselines in both emotion regulation accuracy and user engagement, empirically validating the critical role of vocal emotion in enhancing conversational appeal.

Technology Category

Application Category

📝 Abstract
We develop a task-oriented spoken dialogue system (SDS) that regulates emotional speech based on contextual cues to enable more empathetic news conversations. Despite advancements in emotional text-to-speech (TTS) techniques, task-oriented emotional SDSs remain underexplored due to the compartmentalized nature of SDS and emotional TTS research, as well as the lack of standardized evaluation metrics for social goals. We address these challenges by developing an emotional SDS for news conversations that utilizes a large language model (LLM)-based sentiment analyzer to identify appropriate emotions and PromptTTS to synthesize context-appropriate emotional speech. We also propose subjective evaluation scale for emotional SDSs and judge the emotion regulation performance of the proposed and baseline systems. Experiments showed that our emotional SDS outperformed a baseline system in terms of the emotion regulation and engagement. These results suggest the critical role of speech emotion for more engaging conversations. All our source code is open-sourced at https://github.com/dhatchi711/espnet-emotional-news/tree/emo-sds/egs2/emo_news_sds/sds1
Problem

Research questions and friction points this paper is trying to address.

Develops emotional speech regulation for empathetic news conversations
Bridges gap between SDS and emotional TTS research
Proposes evaluation metrics for emotion regulation in SDS
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based sentiment analyzer for emotion identification
PromptTTS for context-appropriate emotional speech synthesis
Subjective evaluation scale for emotion regulation assessment
🔎 Similar Papers
No similar papers found.
R
Ryuki Matsuura
Carnegie Mellon University
Shikhar Bharadwaj
Shikhar Bharadwaj
Carnegie Mellon University
Speech processingSelf supervised learning
Jiarui Liu
Jiarui Liu
Carnegie Mellon University
Natural Language Processing
D
Dhatchi Kunde Govindarajan
Carnegie Mellon University