🤖 AI Summary
Task-oriented spoken dialogue systems (SDS) have long neglected speech emotion modeling, primarily due to the disciplinary separation between SDS and expressive text-to-speech (TTS) research, as well as the absence of empathy-oriented evaluation metrics. This paper introduces the first empathetic spoken dialogue system tailored to news scenarios, enabling expressive, context-aware emotional modulation in spoken interaction. Methodologically, we propose the first deep end-to-end integration of emotional TTS and task-oriented SDS, unifying an LLM-driven emotion analyzer, PromptTTS-based speech synthesis, and a coherent dialogue management framework. We further introduce the first subjective evaluation scale specifically designed for emotional SDS. Experimental results demonstrate significant improvements over baselines in both emotion regulation accuracy and user engagement, empirically validating the critical role of vocal emotion in enhancing conversational appeal.
📝 Abstract
We develop a task-oriented spoken dialogue system (SDS) that regulates emotional speech based on contextual cues to enable more empathetic news conversations. Despite advancements in emotional text-to-speech (TTS) techniques, task-oriented emotional SDSs remain underexplored due to the compartmentalized nature of SDS and emotional TTS research, as well as the lack of standardized evaluation metrics for social goals. We address these challenges by developing an emotional SDS for news conversations that utilizes a large language model (LLM)-based sentiment analyzer to identify appropriate emotions and PromptTTS to synthesize context-appropriate emotional speech. We also propose subjective evaluation scale for emotional SDSs and judge the emotion regulation performance of the proposed and baseline systems. Experiments showed that our emotional SDS outperformed a baseline system in terms of the emotion regulation and engagement. These results suggest the critical role of speech emotion for more engaging conversations. All our source code is open-sourced at https://github.com/dhatchi711/espnet-emotional-news/tree/emo-sds/egs2/emo_news_sds/sds1