H2HTalk: Evaluating Large Language Models as Emotional Companion

📅 2025-07-04

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Large language models (LLMs) lack systematic evaluation frameworks for emotional companionship. Method: We introduce H2HTalk—the first comprehensive benchmark for emotionally intelligent companions—comprising a Safety-Aware Attachment Personality (SAP) module grounded in attachment theory to enhance interaction safety, and a unified evaluation protocol derived from 4,650 multi-scenario dialogues, assessing personality development, empathetic interaction, long-term memory, and implicit need understanding across 50 mainstream LLMs. Contribution/Results: Empirical evaluation reveals critical bottlenecks in current LLMs’ capabilities for dynamic/implicit user need recognition and long-horizon planning. Integrating the SAP module significantly improves interaction safety and user trust. This work establishes foundational theory, standardized evaluation criteria, and empirical evidence to advance the development of trustworthy AI for emotional support.

Technology Category

Application Category

📝 Abstract

As digital emotional support needs grow, Large Language Model companions offer promising authentic, always-available empathy, though rigorous evaluation lags behind model advancement. We present Heart-to-Heart Talk (H2HTalk), a benchmark assessing companions across personality development and empathetic interaction, balancing emotional intelligence with linguistic fluency. H2HTalk features 4,650 curated scenarios spanning dialogue, recollection, and itinerary planning that mirror real-world support conversations, substantially exceeding previous datasets in scale and diversity. We incorporate a Secure Attachment Persona (SAP) module implementing attachment-theory principles for safer interactions. Benchmarking 50 LLMs with our unified protocol reveals that long-horizon planning and memory retention remain key challenges, with models struggling when user needs are implicit or evolve mid-conversation. H2HTalk establishes the first comprehensive benchmark for emotionally intelligent companions. We release all materials to advance development of LLMs capable of providing meaningful and safe psychological support.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs as emotional companions for authentic empathy

Assessing companions in personality and empathetic interaction balance

Addressing challenges in long-horizon planning and memory retention

Innovation

Methods, ideas, or system contributions that make the work stand out.

H2HTalk benchmark for emotional companion evaluation

Secure Attachment Persona module for safer interactions

Unified protocol testing 50 LLMs on key challenges

🔎 Similar Papers

Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation

2024-02-20Annual Meeting of the Association for Computational LinguisticsCitations: 17