🤖 AI Summary
Large language models (LLMs) often lack authentic empathic listening and emotionally grounded interaction in dialogue. Method: We propose a human-AI collaborative fine-tuning framework: (1) augmenting a small expert-curated empathic dialogue dataset using ChatGPT and Gemini to generate high-quality synthetic samples; and (2) introducing a dual-path evaluation protocol integrating structured sentiment analysis (VADER) with multi-dimensional expert annotation—specifically assessing emotional trajectory evolution, depth of empathic perception, and response coherence. Contribution/Results: Experiments reveal that structural sentiment alignment alone is insufficient for genuine empathy; qualitative depth significantly modulates user-perceived empathy. Systematic inter-model differences in empathic quality are observed. Our work validates the necessity of human-AI collaboration in building empathic dialogue agents and establishes a reproducible methodological paradigm for evaluating and optimizing empathic LLMs.
📝 Abstract
Conversational agents have made significant progress since ELIZA, expanding their role across various domains, including healthcare, education, and customer service. As these agents become increasingly integrated into daily human interactions, the need for emotional intelligence, particularly empathetic listening, becomes increasingly essential. In this study, we explore how Large Language Models (LLMs) respond when tasked with generating emotionally rich interactions. Starting from a small dataset manually crafted by an expert to reflect empathic behavior, we extended the conversations using two LLMs: ChatGPT and Gemini. We analyzed the emotional progression of the dialogues using both sentiment analysis (via VADER) and expert assessments. While the generated conversations often mirrored the intended emotional structure, human evaluation revealed important differences in the perceived empathy and coherence of the responses. These findings suggest that emotion modeling in dialogues requires not only structural alignment in the expressed emotions but also qualitative depth, highlighting the importance of combining automated and humancentered methods in the development of emotionally competent agents.