π€ AI Summary
This study addresses the challenge that speaker identity information in voice-based monitoring of asthma and COPD exacerbations can interfere with pathological assessment and compromise privacy. The authors propose an adversarial learning architecture employing a gradient reversal layer to disentangle pathology-relevant acoustic features from speaker-specific characteristics, while jointly optimizing two clinical tasks: respiratory state classification and exacerbation type identification. This work presents the first speaker-invariant pathological feature extraction framework for respiratory disease voice analysis and enhances model interpretability through SHAP. Evaluated on the TACTICAS dataset, the approach achieves AUCs of 0.910 and 0.793 for the two tasks, respectively, with significant suppression of speaker information. Cross-dataset validation on Bridge2AI-Voice further demonstrates the methodβs generalizability.
π Abstract
Early detection of exacerbations in asthma and chronic obstructive pulmonary disease (COPD) is important for timely intervention. Speech has emerged as a promising tool for continuous, non-invasive respiratory disease monitoring. However, speech signals inherently carry speaker-identifiable attributes that may dominate model predictions, which may compromise both diagnosis performance and patient privacy. Furthermore, the acoustic features associated with respiratory disease and speaker identity remain unclear in respiratory disease monitoring. We propose an adversarial learning architecture that disentangles pathology-related acoustic patterns from speaker-identifiable attributes. The framework optimizes two clinically hierarchical tasks: (i) respiratory status classification (stable vs. exacerbated) and (ii) exacerbation type classification (asthma exacerbation vs. COPD exacerbation). Speaker identity is suppressed through gradient reversal-based adversarial training. To enhance clinical interpretability, we employ SHapley Additive exPlanations (SHAP) to quantify the contributions of acoustic features to pathology-related predictions versus speaker identity. On the TACTICAS dataset, our method outperforms the single-task baseline across both tasks. For the respiratory status task (stable vs. exacerbated), the AUC improves from 0.897 to 0.910. For the exacerbation type task (asthma exacerbation vs. COPD exacerbation), the AUC increases from 0.674 to 0.793. Concurrently, the J-ratio decreases, confirming effective suppression of speaker information. SHAP analysis reveals the contributions of the acoustic features to both tasks. External validation on the Bridge2AI-Voice dataset further demonstrates consistent performance improvement and reduced speaker dependency, confirming cross-dataset generalizability.