๐ค AI Summary
Depression screening faces challenges including diagnostic complexity, ambiguous clinical criteria, and low help-seeking rates. To address these, this paper proposes a voice-based multimodal intelligent diagnostic approach that jointly models temporal and spectral features of speech signalsโmarking the first effort to integrate both domains for depression assessment and overcoming the limited representational capacity of conventional single-domain analyses. We design a deep neural network architecture enabling end-to-end time-frequency joint representation learning, augmented by feature optimization and discriminative classification strategies. Evaluated on a publicly available depressive speech dataset, our method achieves 92.3% classification accuracy, significantly outperforming single-domain baseline models. This work establishes a novel, non-invasive, and scalable paradigm for automated depression screening, demonstrating strong potential for clinical decision support and large-scale community-level mental health surveillance.
๐ Abstract
Depression, as a typical mental disorder, has become a prevalent issue significantly impacting public health. However, the prevention and treatment of depression still face multiple challenges, including complex diagnostic procedures, ambiguous criteria, and low consultation rates, which severely hinder timely assessment and intervention. To address these issues, this study adopts voice as a physiological signal and leverages its frequency-time dual domain multimodal characteristics along with deep learning models to develop an intelligent assessment and diagnostic algorithm for depression. Experimental results demonstrate that the proposed method achieves excellent performance in the classification task for depression diagnosis, offering new insights and approaches for the assessment, screening, and diagnosis of depression.