Can LLMs Address Mental Health Questions? A Comparison with Human Therapists

📅 2025-09-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Prior research lacks systematic, empirically grounded comparisons between large language models (LLMs) and licensed psychotherapists in authentic clinical query scenarios. Method: This study conducts the first comparative evaluation of ChatGPT, Gemini, and Llama against human therapists using real patient questions, integrating computational text analysis (assessing readability, sentiment polarity) with user surveys (measuring perceived supportiveness, respectfulness, and acceptability). Contribution/Results: LLM responses significantly outperformed human therapists in linguistic clarity, respectfulness, and supportive tone. However, both end users and clinical experts consistently preferred human therapists for emotional depth, therapeutic alliance formation, and privacy assurance. The findings delineate the viable scope—namely, lightweight, adjunctive mental health support—while highlighting critical limitations concerning relational authenticity, contextual nuance, and ethical safeguards. This work provides empirical grounding for defining appropriate boundaries and designing ethically robust AI-augmented psychological services.

Technology Category

Application Category

📝 Abstract
Limited access to mental health care has motivated the use of digital tools and conversational agents powered by large language models (LLMs), yet their quality and reception remain unclear. We present a study comparing therapist-written responses to those generated by ChatGPT, Gemini, and Llama for real patient questions. Text analysis showed that LLMs produced longer, more readable, and lexically richer responses with a more positive tone, while therapist responses were more often written in the first person. In a survey with 150 users and 23 licensed therapists, participants rated LLM responses as clearer, more respectful, and more supportive than therapist-written answers. Yet, both groups of participants expressed a stronger preference for human therapist support. These findings highlight the promise and limitations of LLMs in mental health, underscoring the need for designs that balance their communicative strengths with concerns of trust, privacy, and accountability.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM effectiveness in mental health responses
Comparing AI-generated and human therapist answer quality
Assessing user preference between AI and human support
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using LLMs to generate mental health responses
Comparing multiple AI models against human therapists
Balancing AI communication strengths with ethical concerns
🔎 Similar Papers
No similar papers found.
S
Synthia Wang
University of Chicago
Y
Yuwei Cheng
University of Chicago
A
Austin Song
University of Virginia
S
Sarah Keedy
University of Chicago
M
Marc Berman
University of Chicago
Nick Feamster
Nick Feamster
Neubauer Professor of Computer Science, University of Chicago
SecurityNetworkingComputer NetworksPerformance EvaluationTech Policy