🤖 AI Summary
Current large language models generate semantically fluent yet structurally ungrounded responses for long-form Chinese psychotherapy texts, lacking the structured reasoning required for effective psychological intervention. To address this, we propose Chain-of-Empathy (CoE) reasoning—a cognitively grounded paradigm integrating cognitive behavioral therapy principles to enable interpretable, stepwise affective understanding. We further introduce Empathy-QA, the first large-scale Chinese empathetic question-answering dataset, and adopt a two-stage training strategy: supervised fine-tuning followed by reinforcement learning guided by a custom reward model to enhance therapeutic relevance and contextual adaptability. Experimental results demonstrate that our approach achieves a 44.30% Win@1 rate in human evaluation—significantly outperforming all baselines—and establishes new state-of-the-art performance in empathetic expression accuracy and clinical applicability.
📝 Abstract
Empathy is critical for effective mental health support, especially when addressing Long Counseling Texts (LCTs). However, existing Large Language Models (LLMs) often generate replies that are semantically fluent but lack the structured reasoning necessary for genuine psychological support, particularly in a Chinese context. To bridge this gap, we introduce Empathy-R1, a novel framework that integrates a Chain-of-Empathy (CoE) reasoning process with Reinforcement Learning (RL) to enhance response quality for LCTs. Inspired by cognitive-behavioral therapy, our CoE paradigm guides the model to sequentially reason about a help-seeker's emotions, causes, and intentions, making its thinking process both transparent and interpretable. Our framework is empowered by a new large-scale Chinese dataset, Empathy-QA, and a two-stage training process. First, Supervised Fine-Tuning instills the CoE's reasoning structure. Subsequently, RL, guided by a dedicated reward model, refines the therapeutic relevance and contextual appropriateness of the final responses. Experiments show that Empathy-R1 achieves strong performance on key automatic metrics. More importantly, human evaluations confirm its superiority, showing a clear preference over strong baselines and achieving a Win@1 rate of 44.30% on our new benchmark. By enabling interpretable and contextually nuanced responses, Empathy-R1 represents a significant advancement in developing responsible and genuinely beneficial AI for mental health support.