AI-enabled tuberculosis screening in a high-burden setting using cough sound analysis and speech foundation models

📅 2025-09-11

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

To address the scarcity of effective screening tools for tuberculosis (TB) in high-burden, low-resource settings, this study proposes a robust and scalable AI-based cough audio screening method. Moving beyond prior limitations—including small-scale datasets, poorly representative negative controls, simplistic model architectures, and idealized recording conditions—we introduce a novel multimodal deep learning classifier that integrates pretrained speech foundation models with demographic and clinical features. Evaluated on real-world cough audio recordings, the audio-only model achieves an AUROC of 85.2%; incorporating clinical features elevates performance to 92.1% AUROC and 90.3% sensitivity—meeting WHO-recommended screening tool benchmarks. This approach significantly enhances both accessibility and diagnostic accuracy for initial TB screening in resource-constrained environments.

Technology Category

Application Category

📝 Abstract

Background Artificial intelligence (AI) can detect disease-related acoustic patterns in cough sounds, offering a scalable approach to tuberculosis (TB) screening in high-burden, low-resource settings. Previous studies have been limited by small datasets, under-representation of symptomatic non-TB patients, reliance on simple models, and recordings collected under idealised conditions. Methods We enrolled 512 participants at two hospitals in Zambia, grouped as bacteriologically confirmed TB (TB+), symptomatic patients with other respiratory diseases (OR), and healthy controls (HC). Usable cough recordings plus demographic and clinical data were obtained from 500 participants. Deep learning classifiers based on speech foundation models were trained on cough recordings. The best-performing model, trained on 3-second segments, was further evaluated with demographic and clinical features. Findings The best audio-only classifier achieved an AUROC of 85.2% for distinguishing TB+ from all others (TB+/Rest) and 80.1% for TB+ versus OR. Adding demographic and clinical features improved performance to 92.1% (TB+/Rest) and 84.2% (TB+/OR). At a threshold of 0.38, the multimodal model reached 90.3% sensitivity and 73.1% specificity for TB+/Rest, and 80.6% and 73.1% for TB+/OR. Interpretation Cough analysis using speech foundation models, especially when combined with demographic and clinical data, showed strong potential as a TB triage tool, meeting WHO target product profile benchmarks. The model was robust to confounding factors including background noise, recording time, and device variability, indicating detection of genuine disease-related acoustic patterns. Further validation across diverse regions and case definitions, including subclinical TB, is required before clinical use.

Problem

Research questions and friction points this paper is trying to address.

Developing AI for TB screening using cough sounds

Addressing limitations of small datasets and simple models

Validating model robustness across diverse conditions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using speech foundation models for cough analysis

Combining audio with demographic and clinical features

Training deep learning classifiers on cough recordings

🔎 Similar Papers

No similar papers found.