Detecting PTSD in Clinical Interviews: A Comparative Analysis of NLP Methods and Large Language Models

📅 2025-04-01

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

High rates of PTSD underdiagnosis in clinical practice necessitate automated, NLP-based detection methods. This study presents the first systematic evaluation of domain-adapted pre-trained models, embedding strategies, and large language model (LLM) prompting paradigms for PTSD identification from clinical interview transcripts. We propose a multi-paradigm NLP framework comprising: (1) fine-tuned domain-specific models (e.g., Mental-RoBERTa); (2) semantic embeddings (Sentence-BERT, LLaMA+MLP); and (3) zero- and few-shot DSM-5–informed prompting. Key findings: mental health–specialized RoBERTa and LLaMA-based embeddings substantially outperform general-purpose models; zero-shot DSM-5 prompting achieves F1 = 0.657 without labeled data; and LLaMA embeddings combined with an MLP yield the best performance (F1 = 0.700), particularly for severe PTSD and comorbid depression cases. Our framework offers an interpretable, low-label-dependency paradigm for automated PTSD screening in resource-constrained clinical settings.

Technology Category

Application Category

📝 Abstract

Post-Traumatic Stress Disorder (PTSD) remains underdiagnosed in clinical settings, presenting opportunities for automated detection to identify patients. This study evaluates natural language processing approaches for detecting PTSD from clinical interview transcripts. We compared general and mental health-specific transformer models (BERT/RoBERTa), embedding-based methods (SentenceBERT/LLaMA), and large language model prompting strategies (zero-shot/few-shot/chain-of-thought) using the DAIC-WOZ dataset. Domain-specific models significantly outperformed general models (Mental-RoBERTa F1=0.643 vs. RoBERTa-base 0.485). LLaMA embeddings with neural networks achieved the highest performance (F1=0.700). Zero-shot prompting using DSM-5 criteria yielded competitive results without training data (F1=0.657). Performance varied significantly across symptom severity and comorbidity status, with higher accuracy for severe PTSD cases and patients with comorbid depression. Our findings highlight the potential of domain-adapted embeddings and LLMs for scalable screening while underscoring the need for improved detection of nuanced presentations and offering insights for developing clinically viable AI tools for PTSD assessment.

Problem

Research questions and friction points this paper is trying to address.

Evaluating NLP methods for PTSD detection in clinical interviews

Comparing transformer models and LLM strategies for diagnosis accuracy

Assessing performance variations across symptom severity and comorbidities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-specific transformer models outperform general models

LLaMA embeddings with neural networks achieve highest performance

Zero-shot prompting using DSM-5 criteria yields competitive results

🔎 Similar Papers

Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study