Mental-Perceiver: Audio-Textual Multi-Modal Learning for Estimating Mental Disorders

📅 2024-08-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the early identification of anxiety and depression disorders in adolescents by proposing a novel multimodal mental state assessment paradigm. To overcome the scarcity of Chinese multimodal psychological assessment data, we introduce MMPsy—the first large-scale, Chinese adolescent multimodal psychological assessment dataset—comprising speech recordings, transcribed text, and standardized clinical scale scores. Methodologically, we design Mental-Perceiver, an end-to-end audio-text collaborative perception model built upon the Perceiver IO architecture. It integrates wav2vec 2.0 speech representations and BERT-based textual embeddings, employing cross-modal attention for feature alignment and joint regression prediction. Experiments on MMPsy and the English DAIC-WOZ benchmark demonstrate that Mental-Perceiver achieves average F1-score improvements of 6.2% and 5.8% on anxiety/depression detection, respectively, significantly outperforming unimodal and existing multimodal baselines. These results validate the efficacy and generalizability of cross-modal collaborative modeling for clinical辅助 assessment.

Technology Category

Application Category

📝 Abstract
Mental disorders, such as anxiety and depression, have become a global concern that affects people of all ages. Early detection and treatment are crucial to mitigate the negative effects these disorders can have on daily life. Although AI-based detection methods show promise, progress is hindered by the lack of publicly available large-scale datasets. To address this, we introduce the Multi-Modal Psychological assessment corpus (MMPsy), a large-scale dataset containing audio recordings and transcripts from Mandarin-speaking adolescents undergoing automated anxiety/depression assessment interviews. MMPsy also includes self-reported anxiety/depression evaluations using standardized psychological questionnaires. Leveraging this dataset, we propose Mental-Perceiver, a deep learning model for estimating mental disorders from audio and textual data. Extensive experiments on MMPsy and the DAIC-WOZ dataset demonstrate the effectiveness of Mental-Perceiver in anxiety and depression detection.
Problem

Research questions and friction points this paper is trying to address.

Detects mental disorders using multimodal data
Addresses dataset scarcity for AI-based methods
Improves early detection of anxiety and depression
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Modal Psychological assessment corpus (MMPsy)
Mental-Perceiver deep learning model
Audio and textual data analysis
🔎 Similar Papers
No similar papers found.
Jinghui Qin
Jinghui Qin
Guangdong University of Technology
Multimodal Deep LearningNatural Language ProcessingComputer Vision
C
Changsong Liu
Shuye Intelligent Co., Ltd., University of Toronto
T
Tianchi Tang
Shuye Intelligent Co., Ltd.
D
Dahuang Liu
Shuye Intelligent Co., Ltd.
M
Minghao Wang
Shuye Intelligent Co., Ltd.
Q
Qianying Huang
Shuye Intelligent Co., Ltd.
Rumin Zhang
Rumin Zhang
Ningbo Institute Of Digital Twin(EIAS)