🤖 AI Summary
To address class imbalance and high annotation costs in chest X-ray (CXR) severity classification of pulmonary diseases, this paper proposes a lightweight Bayesian deep active learning framework. Methodologically, it approximates a Bayesian neural network via Monte Carlo Dropout, employs a weighted loss function to mitigate class imbalance, and integrates a dual acquisition strategy—combining entropy with mean–standard deviation uncertainty—to select the most informative samples. Experiments demonstrate that, for binary classification, the framework achieves 93.7% accuracy (AUROC = 0.91) using only 15.4% of the labeled data; for multi-class classification, it attains 70.3% accuracy (AUROC = 0.86) with just 23.1% labeled data—substantially outperforming state-of-the-art active learning and resampling methods. This work is the first to systematically introduce uncertainty-driven Bayesian active learning to CXR severity grading, achieving strong few-shot robustness while preserving clinical interpretability.
📝 Abstract
To reduce the amount of required labeled data for lung disease severity classification from chest X-rays (CXRs) under class imbalance, this study applied deep active learning with a Bayesian Neural Network (BNN) approximation and weighted loss function. This retrospective study collected 2,319 CXRs from 963 patients (mean age, 59.2 $pm$ 16.6 years; 481 female) at Emory Healthcare affiliated hospitals between January and November 2020. All patients had clinically confirmed COVID-19. Each CXR was independently labeled by 3 to 6 board-certified radiologists as normal, moderate, or severe. A deep neural network with Monte Carlo Dropout was trained using active learning to classify disease severity. Various acquisition functions were used to iteratively select the most informative samples from an unlabeled pool. Performance was evaluated using accuracy, area under the receiver operating characteristic curve (AU ROC), and area under the precision-recall curve (AU PRC). Training time and acquisition time were recorded. Statistical analysis included descriptive metrics and performance comparisons across acquisition strategies. Entropy Sampling achieved 93.7% accuracy (AU ROC, 0.91) in binary classification (normal vs. diseased) using 15.4% of the training data. In the multi-class setting, Mean STD sampling achieved 70.3% accuracy (AU ROC, 0.86) using 23.1% of the labeled data. These methods outperformed more complex and computationally expensive acquisition functions and significantly reduced labeling needs. Deep active learning with BNN approximation and weighted loss effectively reduces labeled data requirements while addressing class imbalance, maintaining or exceeding diagnostic performance.