Mental-Perceiver: Audio-Textual Multi-Modal Learning for Estimating Mental Disorders

📅 2024-08-22

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This study addresses the early identification of anxiety and depression disorders in adolescents by proposing a novel multimodal mental state assessment paradigm. To overcome the scarcity of Chinese multimodal psychological assessment data, we introduce MMPsy—the first large-scale, Chinese adolescent multimodal psychological assessment dataset—comprising speech recordings, transcribed text, and standardized clinical scale scores. Methodologically, we design Mental-Perceiver, an end-to-end audio-text collaborative perception model built upon the Perceiver IO architecture. It integrates wav2vec 2.0 speech representations and BERT-based textual embeddings, employing cross-modal attention for feature alignment and joint regression prediction. Experiments on MMPsy and the English DAIC-WOZ benchmark demonstrate that Mental-Perceiver achieves average F1-score improvements of 6.2% and 5.8% on anxiety/depression detection, respectively, significantly outperforming unimodal and existing multimodal baselines. These results validate the efficacy and generalizability of cross-modal collaborative modeling for clinical辅助 assessment.

Technology Category

Application Category

📝 Abstract

Mental disorders, such as anxiety and depression, have become a global concern that affects people of all ages. Early detection and treatment are crucial to mitigate the negative effects these disorders can have on daily life. Although AI-based detection methods show promise, progress is hindered by the lack of publicly available large-scale datasets. To address this, we introduce the Multi-Modal Psychological assessment corpus (MMPsy), a large-scale dataset containing audio recordings and transcripts from Mandarin-speaking adolescents undergoing automated anxiety/depression assessment interviews. MMPsy also includes self-reported anxiety/depression evaluations using standardized psychological questionnaires. Leveraging this dataset, we propose Mental-Perceiver, a deep learning model for estimating mental disorders from audio and textual data. Extensive experiments on MMPsy and the DAIC-WOZ dataset demonstrate the effectiveness of Mental-Perceiver in anxiety and depression detection.

Problem

Research questions and friction points this paper is trying to address.

Detects mental disorders using multimodal data

Addresses dataset scarcity for AI-based methods

Improves early detection of anxiety and depression

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Modal Psychological assessment corpus (MMPsy)

Mental-Perceiver deep learning model

Audio and textual data analysis

🔎 Similar Papers

Multimodal Machine Learning in Mental Health: A Survey of Data, Algorithms, and Challenges