🤖 AI Summary
Existing LLM-based depression detection methods struggle to identify subtle symptoms and lack interpretable, clinically grounded reasoning. To address this, we propose a four-stage chain-of-thought (CoT) reasoning framework—comprising sentiment analysis, binary classification, etiology identification, and severity assessment—that formalizes structured clinical reasoning for mental health diagnosis. Our approach pioneers a diagnostic CoT prompting paradigm that transforms opaque LLM decisions into traceable, verifiable clinical inference pathways. It integrates CoT prompting with a progressive multi-stage architecture and is fine-tuned and evaluated on the E-DAIC-WOZ multimodal (speech + text) dataset. On E-DAIC, our method achieves a 7.2% absolute improvement in classification accuracy over strong baselines. Crucially, it generates clinically meaningful, fine-grained justifications—including stressor typology and symptom-weight distributions—thereby enhancing both discriminative performance and clinician trust.
📝 Abstract
Depression is one of the leading causes of disability worldwide, posing a severe burden on individuals, healthcare systems, and society at large. Recent advancements in Large Language Models (LLMs) have shown promise in addressing mental health challenges, including the detection of depression through text-based analysis. However, current LLM-based methods often struggle with nuanced symptom identification and lack a transparent, step-by-step reasoning process, making it difficult to accurately classify and explain mental health conditions. To address these challenges, we propose a Chain-of-Thought Prompting approach that enhances both the performance and interpretability of LLM-based depression detection. Our method breaks down the detection process into four stages: (1) sentiment analysis, (2) binary depression classification, (3) identification of underlying causes, and (4) assessment of severity. By guiding the model through these structured reasoning steps, we improve interpretability and reduce the risk of overlooking subtle clinical indicators. We validate our method on the E-DAIC dataset, where we test multiple state-of-the-art large language models. Experimental results indicate that our Chain-of-Thought Prompting technique yields superior performance in both classification accuracy and the granularity of diagnostic insights, compared to baseline approaches.