Speaker-Disentangled Remote Speech Detection of Asthma and COPD Exacerbations

πŸ“… 2026-05-16
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

169K/year
πŸ€– AI Summary
This study addresses the challenge that speaker identity information in voice-based monitoring of asthma and COPD exacerbations can interfere with pathological assessment and compromise privacy. The authors propose an adversarial learning architecture employing a gradient reversal layer to disentangle pathology-relevant acoustic features from speaker-specific characteristics, while jointly optimizing two clinical tasks: respiratory state classification and exacerbation type identification. This work presents the first speaker-invariant pathological feature extraction framework for respiratory disease voice analysis and enhances model interpretability through SHAP. Evaluated on the TACTICAS dataset, the approach achieves AUCs of 0.910 and 0.793 for the two tasks, respectively, with significant suppression of speaker information. Cross-dataset validation on Bridge2AI-Voice further demonstrates the method’s generalizability.
πŸ“ Abstract
Early detection of exacerbations in asthma and chronic obstructive pulmonary disease (COPD) is important for timely intervention. Speech has emerged as a promising tool for continuous, non-invasive respiratory disease monitoring. However, speech signals inherently carry speaker-identifiable attributes that may dominate model predictions, which may compromise both diagnosis performance and patient privacy. Furthermore, the acoustic features associated with respiratory disease and speaker identity remain unclear in respiratory disease monitoring. We propose an adversarial learning architecture that disentangles pathology-related acoustic patterns from speaker-identifiable attributes. The framework optimizes two clinically hierarchical tasks: (i) respiratory status classification (stable vs. exacerbated) and (ii) exacerbation type classification (asthma exacerbation vs. COPD exacerbation). Speaker identity is suppressed through gradient reversal-based adversarial training. To enhance clinical interpretability, we employ SHapley Additive exPlanations (SHAP) to quantify the contributions of acoustic features to pathology-related predictions versus speaker identity. On the TACTICAS dataset, our method outperforms the single-task baseline across both tasks. For the respiratory status task (stable vs. exacerbated), the AUC improves from 0.897 to 0.910. For the exacerbation type task (asthma exacerbation vs. COPD exacerbation), the AUC increases from 0.674 to 0.793. Concurrently, the J-ratio decreases, confirming effective suppression of speaker information. SHAP analysis reveals the contributions of the acoustic features to both tasks. External validation on the Bridge2AI-Voice dataset further demonstrates consistent performance improvement and reduced speaker dependency, confirming cross-dataset generalizability.
Problem

Research questions and friction points this paper is trying to address.

speech-based monitoring
speaker disentanglement
asthma exacerbation
COPD exacerbation
patient privacy
Innovation

Methods, ideas, or system contributions that make the work stand out.

adversarial disentanglement
speaker-invariant speech analysis
respiratory disease monitoring
SHAP interpretability
multi-task learning
πŸ”Ž Similar Papers
No similar papers found.
Y
Yuyang Yan
Institute of Data Science, Maastricht University, Paul-Henri Spaaklaan 1, Maastricht, 6229 EN, the Netherlands
S
Sami O. Simons
Department of Respiratory Medicine, NUTRIM Research Institute of Nutrition and Translational Research in Metabolism, Faculty of Health Medicine and Life Sciences, Maastricht University, P. Debyelaan 25, Maastricht, 6229 HX, the Netherlands; Department of Respiratory Medicine, Maastricht University Medical Centre, P. Debyelaan 25, Maastricht, 6229 HX, the Netherlands
Visara Urovi
Visara Urovi
Associate Professor, University of Maastricht, Netherlands
Artificial IntelligenceeHealthDistributed SystemsBlockchainMulti-agent Systems