🤖 AI Summary
Current end-to-end spoken dialogue systems rely on rigid supervisory signals, making it difficult to capture the complexity and diversity of empathetic expression. To address this limitation, this work proposes ReEmpathy—an end-to-end dialogue model endowed with empathetic self-reflection capabilities. ReEmpathy dynamically perceives and refines its empathetic responses through an alternating process of spoken response generation and free-form empathetic reflection reasoning. The core innovation lies in the introduction of EmpathyEval, the first natural language–based empathy evaluation model, which drives the self-reflection mechanism. Experimental results demonstrate that ReEmpathy significantly enhances the quality of empathy-sensitive dialogues, offering a novel paradigm for developing emotionally intelligent human–computer interaction systems.
📝 Abstract
End-to-end Spoken Language Models (SLMs) hold great potential for paralinguistic perception, and numerous studies have aimed to enhance their capabilities, particularly for empathetic dialogue. However, current approaches largely depend on rigid supervised signals, such as ground-truth response in supervised fine-tuning or preference scores in reinforcement learning. Such reliance is fundamentally limited for modeling complex empathy, as there is no single"correct"response and a simple numerical score cannot fully capture the nuances of emotional expression or the appropriateness of empathetic behavior. To address these limitations, we sequentially introduce EmpathyEval, a descriptive natural-language-based evaluation model for assessing empathetic quality in spoken dialogues. Building upon EmpathyEval, we propose ReEmpathy, an end-to-end SLM that enhances empathetic dialogue through a novel Empathetic Self-Reflective Alternating Inference mechanism, which interleaves spoken response generation with free-form, empathy-related reflective reasoning. Extensive experiments demonstrate that ReEmpathy substantially improves empathy-sensitive spoken dialogue by enabling reflective reasoning, offering a promising approach toward more emotionally intelligent and empathy-aware human-computer interactions.