🤖 AI Summary
Early, non-invasive asthma screening in children remains a clinical challenge. Method: This study proposes an AI-based respiratory sound analysis framework: (1) the first adaptation of Google’s HeAR health audio foundation model to pediatric respiratory disease recognition; (2) construction of SPRSound—the first open-source, multi-age pediatric respiratory sound dataset, annotated across 1 month to 18 years; and (3) robust classification from only 2-second audio clips under low-resource conditions. Results/Contributions: The system achieves >91% accuracy with high sensitivity and precision for asthma cases, validated by ROC/AUC analysis and confusion matrix evaluation. It is lightweight and deployable on edge devices, enabling remote and resource-constrained deployment. Key contributions include: (i) the pioneering transfer of HeAR to pediatric respiratory acoustics; (ii) the release of the SPRSound benchmark dataset; and (iii) a clinically viable, efficient screening paradigm explicitly designed for real-world constraints.
📝 Abstract
Early detection of asthma in children is crucial to prevent long-term respiratory complications and reduce emergency interventions. This work presents an AI-powered diagnostic pipeline that leverages Googles Health Acoustic Representations (HeAR) model to detect early signs of asthma from pediatric respiratory sounds. The SPRSound dataset, the first open-access collection of annotated respiratory sounds in children aged 1 month to 18 years, is used to extract 2-second audio segments labeled as wheeze, crackle, rhonchi, stridor, or normal. Each segment is embedded into a 512-dimensional representation using HeAR, a foundation model pretrained on 300 million health-related audio clips, including 100 million cough sounds. Multiple classifiers, including SVM, Random Forest, and MLP, are trained on these embeddings to distinguish between asthma-indicative and normal sounds. The system achieves over 91% accuracy, with strong performance on precision-recall metrics for positive cases. In addition to classification, learned embeddings are visualized using PCA, misclassifications are analyzed through waveform playback, and ROC and confusion matrix insights are provided. This method demonstrates that short, low-resource pediatric recordings, when powered by foundation audio models, can enable fast, noninvasive asthma screening. The approach is especially promising for digital diagnostics in remote or underserved healthcare settings.