Therapy as an NLP Task: Psychologists' Comparison of LLMs and Human Peers in CBT

📅 2024-09-03
🏛️ arXiv.org
📈 Citations: 10
Influential: 0
📄 PDF
🤖 AI Summary
This study evaluates the clinical feasibility of large language models (LLMs) as substitutes for human therapists in single-session cognitive behavioral therapy (CBT). Method: We employed the HELPERT prompting framework to structure LLM-generated CBT dialogues and conducted blinded, session-level quantitative evaluations by licensed CBT psychologists using the Cognitive Therapy Rating Scale (CTRS). Contribution/Results: For the first time, we systematically benchmarked LLMs against human therapists against clinical gold-standard criteria—including empathy, collaboration, cultural sensitivity, and CBT fidelity. Results indicate significantly higher CBT technical adherence in LLMs versus humans (p < 0.01), yet markedly deficient empathic depth, therapeutic alliance formation, and cultural adaptation—posing risks of “deceptive empathy.” The findings expose critical safety concerns with fully automated LLM-delivered therapy and advocate a human-AI collaborative ethical framework, providing empirical grounding for the responsible integration of AI in psychological interventions.

Technology Category

Application Category

📝 Abstract
Wider access to therapeutic care is one of the biggest challenges in mental health treatment. Due to institutional barriers, some people seeking mental health support have turned to large language models (LLMs) for personalized therapy, even though these models are largely unsanctioned and untested. We investigate the potential and limitations of using LLMs as providers of evidence-based therapy by using mixed methods clinical metrics. Using HELPERT, a prompt run on a large language model using the same process and training as a comparative group of peer counselors, we replicated publicly accessible mental health conversations rooted in Cognitive Behavioral Therapy (CBT) to compare session dynamics and counselor's CBT-based behaviors between original peer support sessions and their reconstructed HELPERT sessions. Two licensed, CBT-trained clinical psychologists evaluated the sessions using the Cognitive Therapy Rating Scale and provided qualitative feedback. Our findings show that the peer sessions are characterized by empathy, small talk, therapeutic alliance, and shared experiences but often exhibit therapist drift. Conversely, HELPERT reconstructed sessions exhibit minimal therapist drift and higher adherence to CBT methods but display a lack of collaboration, empathy, and cultural understanding. Through CTRS ratings and psychologists' feedback, we highlight the importance of human-AI collaboration for scalable mental health. Our work outlines the ethical implication of imparting human-like subjective qualities to LLMs in therapeutic settings, particularly the risk of deceptive empathy, which may lead to unrealistic patient expectations and potential harm.
Problem

Research questions and friction points this paper is trying to address.

Comparing LLMs and human counselors in session-level CBT performance
Evaluating LLM adherence to CBT techniques versus human relational strategies
Assessing risks of LLM-generated deceptive empathy in therapy sessions
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs prompted by peer counselors for CBT
Mixed-methods study with ethnography and simulation
Human-AI workflows combining CBT techniques and relational support
🔎 Similar Papers
No similar papers found.
Z
Zainab Iftikhar
Department of Computer Science, Brown University
S
Sean Ransom
Department of Psychiatry, Louisiana State University Health Sciences
A
Amy Xiao
Department of Computer Science, Brown University
J
Jeff Huang
Department of Computer Science, Brown University