🤖 AI Summary
This study addresses the limited generalizability of deep learning models in multi-center CT imaging due to heterogeneity in acquisition devices, protocols, and patient populations, further exacerbated by imbalanced data distributions. To enhance robustness and fairness, the authors propose a multi-task learning framework that jointly predicts COVID-19 diagnosis and imaging site using a shared EfficientNet-B7 backbone. The approach incorporates a logit-adjusted cross-entropy loss to mitigate class imbalance and leverages SSFL and KDS preprocessing strategies. Evaluated on a validation set of 308 cases, the method achieves an F1 score of 0.9098 and an AUC-ROC of 0.9647, significantly outperforming single-task baselines and demonstrating improved performance in imbalanced, multi-center settings.
📝 Abstract
Deep learning models for COVID-19 detection from chest CT scans generally perform well when the training and test data originate from the same institution, but they often struggle when scans are drawn from multiple centres with differing scanners, imaging protocols, and patient populations. One key reason is that existing methods treat COVID-19 classification as the sole training objective, without accounting for the data source of each scan. As a result, the learned representations tend to be biased toward centres that contribute more training data. To address this, we propose a multi-task learning approach in which the model is trained to predict both the COVID-19 diagnosis and the originating data centre. The two tasks share an EfficientNet-B7 backbone, which encourages the feature extractor to learn representations that hold across all four participating centres. Since the training data is not evenly distributed across sources, we apply a logit-adjusted cross-entropy loss [1] to the source classification head to prevent underrepresented centres from being overlooked. Our pre-processing follows the SSFL framework with KDS [2], selecting eight representative slices per scan. Our method achieves an F1 score of 0.9098 and an AUC-ROC of 0.9647 on a validation set of 308 scans. The code is publicly available at https://github.com/Purdue-M2/-multisource-covid-ct.