🤖 AI Summary
This study investigates whether large language model (LLM)-generated cognitive behavioral therapy (CBT) dialogues exhibit emotionally realistic dynamics comparable to authentic clinical conversations. To address this, we propose a discourse-level affective dynamics framework that models and compares emotion trajectories of both therapist and client using fine-grained, three-dimensional affective dimensions—valence, arousal, and dominance—for the first time. We further construct and publicly release RealCBT, the first annotated dataset of real-world CBT dialogues. Results show that authentic dialogues exhibit significantly higher affective variability, richer expressive层次, and more natural affect regulation patterns; notably, client affective dynamics in LLM-generated dialogues show markedly lower similarity to real counterparts. This work exposes fundamental limitations of current LLMs in modeling affective authenticity for psychological interventions, establishing a critical evaluation benchmark and actionable directions for developing trustworthy AI-assisted mental health services.
📝 Abstract
Synthetic therapy dialogues generated by large language models (LLMs) are increasingly used in mental health NLP to simulate counseling scenarios, train models, and supplement limited real-world data. However, it remains unclear whether these synthetic conversations capture the nuanced emotional dynamics of real therapy. In this work, we conduct the first comparative analysis of emotional arcs between real and LLM-generated Cognitive Behavioral Therapy dialogues. We adapt the Utterance Emotion Dynamics framework to analyze fine-grained affective trajectories across valence, arousal, and dominance dimensions. Our analysis spans both full dialogues and individual speaker roles (counselor and client), using real sessions transcribed from public videos and synthetic dialogues from the CACTUS dataset. We find that while synthetic dialogues are fluent and structurally coherent, they diverge from real conversations in key emotional properties: real sessions exhibit greater emotional variability,more emotion-laden language, and more authentic patterns of reactivity and regulation. Moreover, emotional arc similarity between real and synthetic speakers is low, especially for clients. These findings underscore the limitations of current LLM-generated therapy data and highlight the importance of emotional fidelity in mental health applications. We introduce RealCBT, a curated dataset of real CBT sessions, to support future research in this space.