Learning to Attend to Depression-Related Patterns: An Adaptive Cross-Modal Gating Network for Depression Detection

📅 2026-04-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

223K/year
🤖 AI Summary
This study addresses the limitation of existing speech-based depression detection methods, which typically assume that depression-related features are uniformly distributed across utterances and thereby overlook their inherent sparsity. To overcome this, the authors propose a bimodal network incorporating an Adaptive Cross-Modal Gating (ACMG) mechanism that dynamically reweights frames in both acoustic and textual modalities to selectively attend to depression-relevant segments. This work introduces ACMG for the first time, integrating it with attention mechanisms to perform frame-level feature reweighting, effectively capturing clinically meaningful yet sparsely distributed patterns—such as low-energy speech segments and negatively valenced lexical content. Experimental results demonstrate that the proposed model outperforms baseline approaches, and visualization analyses confirm ACMG’s capability to automatically focus on critical depression indicators.

Technology Category

Application Category

📝 Abstract
Automatic depression detection using speech signals with acoustic and textual modalities is a promising approach for early diagnosis. Depression-related patterns exhibit sparsity in speech: diagnostically relevant features occur in specific segments rather than being uniformly distributed. However, most existing methods treat all frames equally, assuming depression-related information is uniformly distributed and thus overlooking this sparsity. To address this issue, we proposes a depression detection network based on Adaptive Cross-Modal Gating (ACMG) that adaptively reassigns frame-level weights across both modalities, enabling selective attention to depression-related segments. Experimental results show that the depression detection system with ACMG outperforms baselines without it. Visualization analyses further confirm that ACMG automatically attends to clinically meaningful patterns, including low-energy acoustic segments and textual segments containing negative sentiments.
Problem

Research questions and friction points this paper is trying to address.

depression detection
speech signals
sparsity
cross-modal
attention
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Cross-Modal Gating
Depression Detection
Sparse Temporal Patterns
Multimodal Attention
Speech-based Diagnosis
🔎 Similar Papers
No similar papers found.
H
Hangbin Yu
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China; University of Chinese Academy of Sciences; Key Laboratory of Biomedical Imaging Science and System, Chinese Academy of Sciences, China
Yudong Yang
Yudong Yang
Tsinghua University
Multimodal LLMSpeech Processing
R
Rongfeng Su
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China; University of Chinese Academy of Sciences; Key Laboratory of Biomedical Imaging Science and System, Chinese Academy of Sciences, China
N
Nan Yan
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China; University of Chinese Academy of Sciences; Key Laboratory of Biomedical Imaging Science and System, Chinese Academy of Sciences, China
Lan Wang
Lan Wang
Professor of Computer Science, University of Memphis
computer networks