🤖 AI Summary
This study addresses the high privacy risk in audio-based depression diagnosis, where sensitive identity information is prone to misuse and difficult to disentangle from depression-related features. To this end, the paper proposes TAAC, the first trustworthy audio affective computing framework that guarantees separability between depression and identity characteristics through adversarial loss-driven subspace decomposition. TAAC integrates a Differentiated Feature Subspace Decomposer (DFSD), a Flexible Noise Encoder (FNE), and a staged training paradigm to effectively encrypt identity information while preserving high diagnostic accuracy. Experimental results demonstrate that TAAC outperforms existing methods in depression detection performance, identity privacy protection, and audio reconstruction quality, maintaining robustness across varying encryption strengths.
📝 Abstract
With the emergence of AI techniques for depression diagnosis, the conflict between high demand and limited supply for depression screening has been significantly alleviated. Among various modal data, audio-based depression diagnosis has received increasing attention from both academia and industry since audio is the most common carrier of emotion transmission. Unfortunately, audio data also contains User-sensitive Identity Information (ID), which is extremely vulnerable and may be maliciously used during the smart diagnosis process. Among previous methods, the clarification between depression features and sensitive features has always serve as a barrier. It is also critical to the problem for introducing a safe encryption methodology that only encrypts the sensitive features and a powerful classifier that can correctly diagnose the depression. To track these challenges, by leveraging adversarial loss-based Subspace Decomposition, we propose a first practical framework \name presented for Trustable Audio Affective Computing, to perform automated depression detection through audio within a trustable environment. The key enablers of TAAC are Differentiating Features Subspace Decompositor (DFSD), Flexible Noise Encryptor (FNE) and Staged Training Paradigm, used for decomposition, ID encryption and performance enhancement, respectively. Extensive experiments with existing encryption methods demonstrate our framework's preeminent performance in depression detection, ID reservation and audio reconstruction. Meanwhile, the experiments across various setting demonstrates our model's stability under different encryption strengths. Thus proving our framework's excellence in Confidentiality, Accuracy, Traceability, and Adjustability.