🤖 AI Summary
This study addresses the fragmentation, lack of systematic indexing, and absence of dynamic curation mechanisms in publicly available social media data for depression modeling. We systematically surveyed and curated 32 high-quality, depression-related social media datasets published between 2019 and 2024. Through meta-analysis, cross-platform provenance tracing, and structured annotation, we constructed the first standardized, five-year-spanning, continuously maintained inventory of depression-oriented social media resources. Concurrently, we developed SocialDepressionDB—a searchable, interactive online knowledge base—filling a critical gap in authoritative, domain-specific resource indexing. The database supports precise dataset discovery, comparative analysis, and longitudinal tracking of data evolution. It has been adopted by multiple NLP and digital mental health research teams, significantly enhancing data accessibility and methodological reproducibility in computational modeling of depressive language patterns.
📝 Abstract
Depression is the most common mental health disorder, and its prevalence increased during the COVID-19 pandemic. As one of the most extensively researched psychological conditions, recent research has increasingly focused on leveraging social media data to enhance traditional methods of depression screening. This paper addresses the growing interest in interdisciplinary research on depression, and aims to support early-career researchers by providing a comprehensive and up-to-date list of datasets for analyzing and predicting depression through social media data. We present an overview of datasets published between 2019 and 2024. We also make the comprehensive list of datasets available online as a continuously updated resource, with the hope that it will facilitate further interdisciplinary research into the linguistic expressions of depression on social media.