🤖 AI Summary
To address the misalignment in empathy levels between model-generated responses and human reference responses in empathetic dialogue generation, this paper proposes EmpRL, a reinforcement learning (RL)-based empathetic alignment framework. We formulate empathetic alignment as an RL task for the first time and design a structured empathy reward function integrating three mechanisms: emotional resonance, cognitive explanation, and proactive exploration—enabling explicit alignment along both affective and cognitive dimensions. Our method employs T5 initialization, a three-module collaborative pre-trained empathy recognizer, and the PPO algorithm within a two-stage paradigm combining maximum-likelihood pre-training and RL fine-tuning. Automatic and human evaluations demonstrate that EmpRL improves empathy strength similarity between generated and reference responses by 23.6%, with statistically significant gains in overall empathy quality over all baselines.
📝 Abstract
Empathetic response generation, aiming to understand the user's situation and feelings and respond empathically, is crucial in building human-like dialogue systems. Traditional approaches typically employ maximum likelihood estimation as the optimization objective during training, yet fail to align the empathy levels between generated and target responses. To this end, we propose an empathetic response generation framework using reinforcement learning (EmpRL). The framework develops an effective empathy reward function and generates empathetic responses by maximizing the expected reward through reinforcement learning. EmpRL utilizes the pre-trained T5 model as the generator and further fine-tunes it to initialize the policy. To align the empathy levels between generated and target responses within a given context, an empathy reward function containing three empathy communication mechanisms -- emotional reaction, interpretation, and exploration -- is constructed using pre-designed and pre-trained empathy identifiers. During reinforcement learning training, the proximal policy optimization algorithm is used to fine-tune the policy, enabling the generation of empathetic responses. Both automatic and human evaluations demonstrate that the proposed EmpRL framework significantly improves the quality of generated responses, enhances the similarity in empathy levels between generated and target responses, and produces empathetic responses covering both affective and cognitive aspects.