Assessing the Effectiveness of LLMs in Delivering Cognitive Behavioral Therapy

📅 2026-03-04

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

This study addresses the growing deployment of large language models (LLMs) in delivering cognitive behavioral therapy (CBT) support despite a lack of professional validation. It presents a systematic evaluation of both generative and retrieval-augmented generation (RAG) approaches in simulating authentic CBT dialogues, introducing a novel multidimensional assessment framework that integrates standard natural language generation metrics, natural language inference (NLI), and automated skill scoring to quantify empathy, semantic coherence, and therapeutic fidelity. The findings reveal that while LLMs can produce CBT-like conversations, they exhibit significant limitations in empathetic expression and dialogical consistency. This work thus establishes a methodological foundation and empirical evidence to guide the future development and evaluation of AI systems for mental health applications.

Technology Category

Application Category

📝 Abstract

As mental health issues continue to rise globally, there is an increasing demand for accessible and scalable therapeutic solutions. Many individuals currently seek support from Large Language Models (LLMs), even though these models have not been validated for use in counseling services. In this paper, we evaluate LLMs' ability to emulate professional therapists practicing Cognitive Behavioral Therapy (CBT). Using anonymized, transcribed role-play sessions between licensed therapists and clients, we compare two approaches: (1) a generation-only method and (2) a Retrieval-Augmented Generation (RAG) approach using CBT guidelines. We evaluate both proprietary and open-source models for linguistic quality, semantic coherence, and therapeutic fidelity using standard natural language generation (NLG) metrics, natural language inference (NLI), and automated scoring for skills assessment. Our results indicate that while LLMs can generate CBT-like dialogues, they are limited in their ability to convey empathy and maintain consistency.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Cognitive Behavioral Therapy

Therapeutic Fidelity

Empathy

Mental Health

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Cognitive Behavioral Therapy

Retrieval-Augmented Generation