🤖 AI Summary
This work addresses the challenge of early, precise symptom severity scoring for depression and post-traumatic stress disorder (PTSD). We propose a multimodal deep learning framework that jointly models clinical interview transcripts and corresponding audio signals. Our key innovation is an LSTM-BiLSTM协同 cross-modal feature fusion mechanism that integrates semantic representations with acoustic prosodic features—including rhythm, fundamental frequency (F0), and pitch—to enhance detection of subtle psychiatric manifestations. The framework employs end-to-end training for simultaneous disorder classification and continuous symptom severity estimation. Experimental results demonstrate strong performance: 92% accuracy for depression classification and 93% for PTSD on held-out test sets—both significantly surpassing unimodal baselines. The approach offers a deployable, clinically viable solution to support timely, data-driven early intervention.
📝 Abstract
Individual's general well-being is greatly impacted by mental health conditions including depression and Post-Traumatic Stress Disorder (PTSD), underscoring the importance of early detection and precise diagnosis in order to facilitate prompt clinical intervention. An advanced multimodal deep learning system for the automated classification of PTSD and depression is presented in this paper. Utilizing textual and audio data from clinical interview datasets, the method combines features taken from both modalities by combining the architectures of LSTM (Long Short Term Memory) and BiLSTM (Bidirectional Long Short-Term Memory).Although text features focus on speech's semantic and grammatical components; audio features capture vocal traits including rhythm, tone, and pitch. This combination of modalities enhances the model's capacity to identify minute patterns connected to mental health conditions. Using test datasets, the proposed method achieves classification accuracies of 92% for depression and 93% for PTSD, outperforming traditional unimodal approaches and demonstrating its accuracy and robustness.