🤖 AI Summary
Existing depression assessment methods rely heavily on non-clinical data and employ complex models with limited clinical deployability. Method: This work proposes a clinically grounded, multimodal depression assessment framework tailored to real-world diagnostic settings. We introduce the C-MIND dataset—comprising synchronized audio, video, transcribed text, and functional near-infrared spectroscopy (fNIRS) neuroimaging signals—and design a clinical-knowledge-guided large language model (LLM) inference mechanism that jointly integrates behavioral analysis and psychiatric diagnostic reasoning. Furthermore, we systematically quantify the contribution of each modality to the diagnostic task. Results: Evaluated on authentic clinical data, our approach achieves a 10% improvement in Macro-F1 score over prior methods, while significantly enhancing model interpretability and practical deployability. To our knowledge, this is the first automated depression assessment framework grounded in real clinical workflows, balancing diagnostic reliability with clinical utility.
📝 Abstract
Depression is a widespread mental disorder that affects millions worldwide. While automated depression assessment shows promise, most studies rely on limited or non-clinically validated data, and often prioritize complex model design over real-world effectiveness. In this paper, we aim to unveil the landscape of clinical depression assessment. We introduce C-MIND, a clinical neuropsychiatric multimodal diagnosis dataset collected over two years from real hospital visits. Each participant completes three structured psychiatric tasks and receives a final diagnosis from expert clinicians, with informative audio, video, transcript, and functional near-infrared spectroscopy (fNIRS) signals recorded. Using C-MIND, we first analyze behavioral signatures relevant to diagnosis. We train a range of classical models to quantify how different tasks and modalities contribute to diagnostic performance, and dissect the effectiveness of their combinations. We then explore whether LLMs can perform psychiatric reasoning like clinicians and identify their clear limitations in realistic clinical settings. In response, we propose to guide the reasoning process with clinical expertise and consistently improves LLM diagnostic performance by up to 10% in Macro-F1 score. We aim to build an infrastructure for clinical depression assessment from both data and algorithmic perspectives, enabling C-MIND to facilitate grounded and reliable research for mental healthcare.