🤖 AI Summary
This study addresses performance disparities of deep learning models across gender, age, and racial subgroups in chest X-ray diagnosis. We propose a model-agnostic, lightweight fairness optimization framework. Methodologically, we replace the final CNN layer with an XGBoost classifier to form a CNN-XGBoost multi-label architecture, integrated with adversarial training, reweighting, data augmentation, and active learning. Evaluated on CheXpert and MIMIC-CXR, the framework achieves robust in-distribution and out-of-distribution bias mitigation. Results demonstrate substantial improvements in subgroup fairness—e.g., a 32–47% reduction in equal opportunity difference—while maintaining or surpassing baseline accuracy. Moreover, computational overhead is significantly reduced compared to conventional fairness-aware deep learning methods. The approach enhances clinical deployability by improving both trustworthiness and inclusivity of AI-assisted radiological decision support.
📝 Abstract
Deep learning models have shown promise in improving diagnostic accuracy from chest X-rays, but they also risk perpetuating healthcare disparities when performance varies across demographic groups. In this work, we present a comprehensive bias detection and mitigation framework targeting sex, age, and race-based disparities when performing diagnostic tasks with chest X-rays. We extend a recent CNN-XGBoost pipeline to support multi-label classification and evaluate its performance across four medical conditions. We show that replacing the final layer of CNN with an eXtreme Gradient Boosting classifier improves the fairness of the subgroup while maintaining or improving the overall predictive performance. To validate its generalizability, we apply the method to different backbones, namely DenseNet-121 and ResNet-50, and achieve similarly strong performance and fairness outcomes, confirming its model-agnostic design. We further compare this lightweight adapter training method with traditional full-model training bias mitigation techniques, including adversarial training, reweighting, data augmentation, and active learning, and find that our approach offers competitive or superior bias reduction at a fraction of the computational cost. Finally, we show that combining eXtreme Gradient Boosting retraining with active learning yields the largest reduction in bias across all demographic subgroups, both in and out of distribution on the CheXpert and MIMIC datasets, establishing a practical and effective path toward equitable deep learning deployment in clinical radiology.