DepFlow: Disentangled Speech Generation to Mitigate Semantic Bias in Depression Detection

πŸ“… 2026-01-01
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing depression detection models are prone to relying on spurious correlations between linguistic sentiment and diagnostic labels, leading to poor robustness in real-world scenarios such as feigned depression. To address this limitation, this work proposes DepFlowβ€”a three-stage conditional speech synthesis framework that, for the first time, disentangles depressive acoustic features from speaker identity and textual content, while enabling an interpretable, continuous control mechanism for depression severity. Leveraging an adversarially trained depressive acoustic encoder, a FiLM-modulated flow-matching TTS model, and a prototype-based severity mapping, DepFlow generates CDoA, an enhanced dataset featuring semantic-acoustic mismatches. Experiments show that training on CDoA improves macro-F1 scores by 9%, 12%, and 5% respectively across three mainstream depression detection models, substantially outperforming conventional data augmentation approaches.

Technology Category

Application Category

πŸ“ Abstract
Speech is a scalable and non-invasive biomarker for early mental health screening. However, widely used depression datasets like DAIC-WOZ exhibit strong coupling between linguistic sentiment and diagnostic labels, encouraging models to learn semantic shortcuts. As a result, model robustness may be compromised in real-world scenarios, such as Camouflaged Depression, where individuals maintain socially positive or neutral language despite underlying depressive states. To mitigate this semantic bias, we propose DepFlow, a three-stage depression-conditioned text-to-speech framework. First, a Depression Acoustic Encoder learns speaker- and content-invariant depression embeddings through adversarial training, achieving effective disentanglement while preserving depression discriminability (ROC-AUC: 0.693). Second, a flow-matching TTS model with FiLM modulation injects these embeddings into synthesis, enabling control over depressive severity while preserving content and speaker identity. Third, a prototype-based severity mapping mechanism provides smooth and interpretable manipulation across the depression continuum. Using DepFlow, we construct a Camouflage Depression-oriented Augmentation (CDoA) dataset that pairs depressed acoustic patterns with positive/neutral content from a sentiment-stratified text bank, creating acoustic-semantic mismatches underrepresented in natural data. Evaluated across three depression detection architectures, CDoA improves macro-F1 by 9%, 12%, and 5%, respectively, consistently outperforming conventional augmentation strategies in depression Detection. Beyond enhancing robustness, DepFlow provides a controllable synthesis platform for conversational systems and simulation-based evaluation, where real clinical data remains limited by ethical and coverage constraints.
Problem

Research questions and friction points this paper is trying to address.

semantic bias
depression detection
camouflaged depression
speech biomarker
model robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

disentangled speech generation
semantic bias mitigation
depression-conditioned TTS
flow-matching
acoustic-semantic mismatch
πŸ”Ž Similar Papers
No similar papers found.
Y
Yuxin Li
College of Computing and Data Science, Nanyang Technological University, Singapore
Xiangyu Zhang
Xiangyu Zhang
PhD Student, University of New South Wales
Speech and Language TechnologyMultimodalFoundation ModelDigital Health
Y
Yifei Li
College of Computing and Data Science, Nanyang Technological University, Singapore
Z
Zhiwei Guo
College of Computing and Data Science, Nanyang Technological University, Singapore
Haoyang Zhang
Haoyang Zhang
Ph.D. student of Computer Science, University of Illinois Urbana-Champaign
Computer ArchitectureSystem Software
E
E. Chng
College of Computing and Data Science, Nanyang Technological University, Singapore
Cuntai Guan
Cuntai Guan
President's Chair Professor, CCDS, Nanyang Technological University
Brain-Computer InterfaceBrain-Computer InterfacesMachine LearningArtificial Intelligence