🤖 AI Summary
This study addresses the early identification of anxiety and depression disorders in adolescents by proposing a novel multimodal mental state assessment paradigm. To overcome the scarcity of Chinese multimodal psychological assessment data, we introduce MMPsy—the first large-scale, Chinese adolescent multimodal psychological assessment dataset—comprising speech recordings, transcribed text, and standardized clinical scale scores. Methodologically, we design Mental-Perceiver, an end-to-end audio-text collaborative perception model built upon the Perceiver IO architecture. It integrates wav2vec 2.0 speech representations and BERT-based textual embeddings, employing cross-modal attention for feature alignment and joint regression prediction. Experiments on MMPsy and the English DAIC-WOZ benchmark demonstrate that Mental-Perceiver achieves average F1-score improvements of 6.2% and 5.8% on anxiety/depression detection, respectively, significantly outperforming unimodal and existing multimodal baselines. These results validate the efficacy and generalizability of cross-modal collaborative modeling for clinical辅助 assessment.
📝 Abstract
Mental disorders, such as anxiety and depression, have become a global concern that affects people of all ages. Early detection and treatment are crucial to mitigate the negative effects these disorders can have on daily life. Although AI-based detection methods show promise, progress is hindered by the lack of publicly available large-scale datasets. To address this, we introduce the Multi-Modal Psychological assessment corpus (MMPsy), a large-scale dataset containing audio recordings and transcripts from Mandarin-speaking adolescents undergoing automated anxiety/depression assessment interviews. MMPsy also includes self-reported anxiety/depression evaluations using standardized psychological questionnaires. Leveraging this dataset, we propose Mental-Perceiver, a deep learning model for estimating mental disorders from audio and textual data. Extensive experiments on MMPsy and the DAIC-WOZ dataset demonstrate the effectiveness of Mental-Perceiver in anxiety and depression detection.