🤖 AI Summary
Mental health disorders—including depression, anxiety, and PTSD—are frequently underdiagnosed in primary care due to reliance on subjective clinical assessment, scarcity of specialized resources, and pervasive stigma; misdiagnosis rates exceed 60%. To address this, we propose a lightweight AI diagnostic framework tailored for real-world clinical dialogues. Our approach innovatively integrates contextual snippet augmentation, zero-shot prompting, and efficient LoRA-based fine-tuning. We train and evaluate GPT-4.1 Mini, LLaMA, and RoBERTa models on a curated dataset of 553 semi-structured clinician-patient interviews with expert annotations. The method markedly enhances contextual awareness and few-shot generalization. Experimental results show mean classification accuracy >80% across the three disorders, with PTSD detection achieving 89% accuracy and 98% recall—substantially outperforming conventional self-report scales. This demonstrates the feasibility and clinical utility of low-barrier, robust AI-assisted screening in resource-constrained and high-stigma settings.
📝 Abstract
Mental health disorders remain among the leading cause of disability worldwide, yet conditions such as depression, anxiety, and Post-Traumatic Stress Disorder (PTSD) are frequently underdiagnosed or misdiagnosed due to subjective assessments, limited clinical resources, and stigma and low awareness. In primary care settings, studies show that providers misidentify depression or anxiety in over 60% of cases, highlighting the urgent need for scalable, accessible, and context-aware diagnostic tools that can support early detection and intervention. In this study, we evaluate the effectiveness of machine learning models for mental health screening using a unique dataset of 553 real-world, semistructured interviews, each paried with ground-truth diagnoses for major depressive episodes (MDE), anxiety disorders, and PTSD. We benchmark multiple model classes, including zero-shot prompting with GPT-4.1 Mini and MetaLLaMA, as well as fine-tuned RoBERTa models using LowRank Adaptation (LoRA). Our models achieve over 80% accuracy across diagnostic categories, with especially strongperformance on PTSD (up to 89% accuracy and 98% recall). We also find that using shorter context, focused context segments improves recall, suggesting that focused narrative cues enhance detection sensitivity. LoRA fine-tuning proves both efficient and effective, with lower-rank configurations (e.g., rank 8 and 16) maintaining competitive performance across evaluation metrics. Our results demonstrate that LLM-based models can offer substantial improvements over traditional self-report screening tools, providing a path toward low-barrier, AI-powerd early diagnosis. This work lays the groundwork for integrating machine learning into real-world clinical workflows, particularly in low-resource or high-stigma environments where access to timely mental health care is most limited.