The Pursuit of Empathy: Evaluating Small Language Models for PTSD Dialogue Support

📅 2025-05-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the empathic capabilities of small language models (SLMs) with 0.5–5 billion parameters in trauma-informed dialogues for post-traumatic stress disorder (PTSD). We introduce TIDE, a novel dataset comprising 10,000 two-turn dialogues grounded in 500 realistic user personas, and propose the first PTSD-specific three-dimensional empathy evaluation framework—assessing emotion recognition, distress normalization, and supportive reflection—rigorously validated by clinical psychologists. Through fine-tuning and human-AI collaborative IRB-approved evaluation, we identify a pronounced “ceiling effect” in SLM empathy, alongside significant demographic disparities: fine-tuning improves perceived empathy for older and highly educated users, yet automated metrics exhibit weak correlation with human judgments. The TIDE dataset will be publicly released to advance lightweight, safe, and trustworthy AI for mental health applications.

Technology Category

Application Category

📝 Abstract
Can small language models with 0.5B to 5B parameters meaningfully engage in trauma-informed, empathetic dialogue for individuals with PTSD? We address this question by introducing TIDE, a dataset of 10,000 two-turn dialogues spanning 500 diverse PTSD client personas and grounded in a three-factor empathy model: emotion recognition, distress normalization, and supportive reflection. All scenarios and reference responses were reviewed for realism and trauma sensitivity by a clinical psychologist specializing in PTSD. We evaluate eight small language models before and after fine-tuning, comparing their outputs to a frontier model (Claude Sonnet 3.5). Our IRB-approved human evaluation and automatic metrics show that fine-tuning generally improves perceived empathy, but gains are highly scenario- and user-dependent, with smaller models facing an empathy ceiling. Demographic analysis shows older adults value distress validation and graduate-educated users prefer nuanced replies, while gender effects are minimal. We highlight the limitations of automatic metrics and the need for context- and user-aware system design. Our findings, along with the planned release of TIDE, provide a foundation for building safe, resource-efficient, and ethically sound empathetic AI to supplement, not replace, clinical mental health care.
Problem

Research questions and friction points this paper is trying to address.

Evaluating small language models for empathetic PTSD dialogue support
Assessing fine-tuned models' empathy gains across diverse scenarios
Analyzing demographic preferences in trauma-informed AI responses
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned small language models for PTSD dialogue
TIDE dataset with 10,000 trauma-informed dialogues
User-aware system design for empathetic AI
🔎 Similar Papers
No similar papers found.
B
BN Suhas
College of Information Sciences and Technology, Penn State University, USA
Yash Mahajan
Yash Mahajan
Auburn University
NLPXAIEvaluationLLMs
D
Dominik Mattioli
College of Information Sciences and Technology, Penn State University, USA
A
Andrew M. Sherrill
Department of Psychiatry and Behavioral Sciences, Emory University, USA
Rosa I. Arriaga
Rosa I. Arriaga
Associate Professor, Interactive Computing, Georgia Tech
hcimHealthsocial computingcognitive science
C
Chris W. Wiese
School of Psychology, Georgia Tech, USA
Saeed Abdullah
Saeed Abdullah
Penn State
HCIDigital HealthmHealthHCAI