Kardia-R1: Unleashing LLMs to Reason toward Understanding and Empathy for Emotional Support via Rubric-as-Judge Reinforcement Learning

📅 2025-11-30

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Existing emotional support dialogue systems suffer from two key limitations: insufficient personalized empathy due to inadequate user identity modeling, and opaque, coarse-grained reward signals that hinder verifiable empathic reasoning. To address these, we propose KardiaBench—the first user-anchored, multi-turn emotional support benchmark—and Rubric-ERL, an explainable reinforcement learning framework grounded in explicit, human-defined scoring rubrics. Rubric-ERL jointly optimizes user understanding, affective inference, and response generation via model-in-the-loop data construction, the GRPO algorithm, and a fine-grained affective reward mechanism, ensuring both psychological plausibility and role consistency. Evaluated on four leading large language models, our approach achieves significant improvements in affect accuracy, empathy level, relevance, role consistency, and safety—demonstrating its generality and effectiveness across diverse architectures.

Technology Category

Application Category

📝 Abstract

As web platforms evolve towards greater personalization and emotional complexity, conversational agents must transcend superficial empathy to demonstrate identity-aware emotional reasoning. However, existing systems face two limitations: (1) reliance on situation-centric datasets lacking persistent user identity, which hampers the capture of personalized affective nuances; and (2) dependence on opaque, coarse reward signals that hinder development of verifiable empathetic reasoning. To address these gaps, we introduce KardiaBench, a large-scale user-grounded benchmark comprising 178,080 QA pairs across 22,080 multi-turn conversations anchored to 671 real-world profiles. The dataset is constructed via a model-in-the-loop pipeline with iterative rubric-guided refinement to ensure psychological plausibility and persona consistency. This progressive empathy pipeline that integrates user comprehension, contextual reasoning, and emotion perception into conversations, followed by iterative critique and rubric-based refinement to ensure psychological plausibility, emotional fidelity, and persona consistency. Building on this, we propose Kardia-R1, a framework that trains models for interpretable, stepwise empathetic cognition. Kardia-R1 leverages Rubric-as-Judge Empathetic Reinforcement Learning (Rubric-ERL), a GRPO-based method that uses explainable, human-aligned rubric rewards to tightly couple user understanding, emotional inference, and supportive response generation. Extensive experiments across four LLM backbones demonstrate that Kardia-R1 consistently outperforms othet methods in emotion accuracy, empathy, relevance, persona consistency, and safety. Our dataset and model will be released at https://github.com/JhCircle/Kardia-R1.

Problem

Research questions and friction points this paper is trying to address.

Develops conversational agents with identity-aware emotional reasoning beyond superficial empathy

Addresses reliance on situation-centric datasets lacking persistent user identity information

Overcomes opaque reward signals hindering verifiable empathetic reasoning development

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rubric-as-Judge Reinforcement Learning for explainable empathetic reasoning

Large-scale benchmark with model-in-the-loop iterative rubric refinement

Stepwise empathetic cognition training via user understanding and emotion inference

🔎 Similar Papers

Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation

2024-02-20Annual Meeting of the Association for Computational LinguisticsCitations: 17

OpenAI

$380K – $445K • Offers Equity

San Francisco, CA, USA

Authors to Follow