π€ AI Summary
Early depression detection faces two key challenges: clinical discriminability and the trade-off between interpretability and predictive accuracy. To address these, we propose DORISβa novel framework that explicitly incorporates clinical diagnostic criteria (e.g., DSM-5) into a large language model (LLM)-driven detection pipeline. DORIS employs a medical-knowledge-guided LLM annotation mechanism to identify high-emotion-intensity textual segments, constructs temporal emotion trajectories, and leverages a hybrid architecture combining LLM-based representation learning with traditional classifiers for dynamic modeling. This design ensures both high accuracy and clinically meaningful interpretability. On benchmark datasets, DORIS achieves a 0.036 improvement in AUPRC over state-of-the-art baselines. Extensive experiments demonstrate its robustness and clinical translatability in real-world social media settings.
π Abstract
Depression harms. However, due to a lack of mental health awareness and fear of stigma, many patients do not actively seek diagnosis and treatment, leading to detrimental outcomes. Depression detection aims to determine whether an individual suffers from depression by analyzing their history of posts on social media, which can significantly aid in early detection and intervention. It mainly faces two key challenges: 1) it requires professional medical knowledge, and 2) it necessitates both high accuracy and explainability. To address it, we propose a novel depression detection system called DORIS, combining medical knowledge and the recent advances in large language models (LLMs). Specifically, to tackle the first challenge, we proposed an LLM-based solution to first annotate whether high-risk texts meet medical diagnostic criteria. Further, we retrieve texts with high emotional intensity and summarize critical information from the historical mood records of users, so-called mood courses. To tackle the second challenge, we combine LLM and traditional classifiers to integrate medical knowledge-guided features, for which the model can also explain its prediction results, achieving both high accuracy and explainability. Extensive experimental results on benchmarking datasets show that, compared to the current best baseline, our approach improves by 0.036 in AUPRC, which can be considered significant, demonstrating the effectiveness of our approach and its high value as an NLP application.