ReDSM5: A Reddit Dataset for DSM-5 Depression Detection

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing depression detection methods predominantly perform post-level binary classification, lacking fine-grained alignment with the DSM-5 clinical diagnostic criteria—resulting in limited interpretability and clinical relevance. To address this, we introduce ReDSM5, the first sentence-level, multi-label dataset for depression symptom annotation on Reddit. Curated by licensed psychologists, it comprises annotations of 1,484 long posts across all nine DSM-5 depressive symptoms, each accompanied by clinically grounded justification rationales. Methodologically, we propose the first framework for semantic alignment between social media text and DSM-5 criteria, integrating clinical semantic analysis with lexico-syntactic and affective feature modeling to jointly support multi-label classification and explanation generation. We publicly release the ReDSM5 dataset and benchmark models. Empirical analysis reveals characteristic linguistic patterns of depression expression in social media, significantly enhancing model clinical credibility and interpretability.

Technology Category

Application Category

📝 Abstract
Depression is a pervasive mental health condition that affects hundreds of millions of individuals worldwide, yet many cases remain undiagnosed due to barriers in traditional clinical access and pervasive stigma. Social media platforms, and Reddit in particular, offer rich, user-generated narratives that can reveal early signs of depressive symptomatology. However, existing computational approaches often label entire posts simply as depressed or not depressed, without linking language to specific criteria from the DSM-5, the standard clinical framework for diagnosing depression. This limits both clinical relevance and interpretability. To address this gap, we introduce ReDSM5, a novel Reddit corpus comprising 1484 long-form posts, each exhaustively annotated at the sentence level by a licensed psychologist for the nine DSM-5 depression symptoms. For each label, the annotator also provides a concise clinical rationale grounded in DSM-5 methodology. We conduct an exploratory analysis of the collection, examining lexical, syntactic, and emotional patterns that characterize symptom expression in social media narratives. Compared to prior resources, ReDSM5 uniquely combines symptom-specific supervision with expert explanations, facilitating the development of models that not only detect depression but also generate human-interpretable reasoning. We establish baseline benchmarks for both multi-label symptom classification and explanation generation, providing reference results for future research on detection and interpretability.
Problem

Research questions and friction points this paper is trying to address.

Detect depression symptoms using Reddit posts linked to DSM-5 criteria
Improve clinical relevance and interpretability of depression detection models
Provide expert-annotated dataset for symptom-specific classification and reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reddit posts annotated for DSM-5 symptoms
Expert-labeled sentence-level depression detection
Interpretable models with clinical rationale
🔎 Similar Papers
No similar papers found.