🤖 AI Summary
To address the scarcity of effective screening tools for tuberculosis (TB) in high-burden, low-resource settings, this study proposes a robust and scalable AI-based cough audio screening method. Moving beyond prior limitations—including small-scale datasets, poorly representative negative controls, simplistic model architectures, and idealized recording conditions—we introduce a novel multimodal deep learning classifier that integrates pretrained speech foundation models with demographic and clinical features. Evaluated on real-world cough audio recordings, the audio-only model achieves an AUROC of 85.2%; incorporating clinical features elevates performance to 92.1% AUROC and 90.3% sensitivity—meeting WHO-recommended screening tool benchmarks. This approach significantly enhances both accessibility and diagnostic accuracy for initial TB screening in resource-constrained environments.
📝 Abstract
Background
Artificial intelligence (AI) can detect disease-related acoustic patterns in cough sounds, offering a scalable approach to tuberculosis (TB) screening in high-burden, low-resource settings. Previous studies have been limited by small datasets, under-representation of symptomatic non-TB patients, reliance on simple models, and recordings collected under idealised conditions.
Methods
We enrolled 512 participants at two hospitals in Zambia, grouped as bacteriologically confirmed TB (TB+), symptomatic patients with other respiratory diseases (OR), and healthy controls (HC). Usable cough recordings plus demographic and clinical data were obtained from 500 participants. Deep learning classifiers based on speech foundation models were trained on cough recordings. The best-performing model, trained on 3-second segments, was further evaluated with demographic and clinical features.
Findings
The best audio-only classifier achieved an AUROC of 85.2% for distinguishing TB+ from all others (TB+/Rest) and 80.1% for TB+ versus OR. Adding demographic and clinical features improved performance to 92.1% (TB+/Rest) and 84.2% (TB+/OR). At a threshold of 0.38, the multimodal model reached 90.3% sensitivity and 73.1% specificity for TB+/Rest, and 80.6% and 73.1% for TB+/OR.
Interpretation
Cough analysis using speech foundation models, especially when combined with demographic and clinical data, showed strong potential as a TB triage tool, meeting WHO target product profile benchmarks. The model was robust to confounding factors including background noise, recording time, and device variability, indicating detection of genuine disease-related acoustic patterns. Further validation across diverse regions and case definitions, including subclinical TB, is required before clinical use.