🤖 AI Summary
To address domain shift in multi-center chest CT data arising from variations in imaging protocols, acquisition devices, and patient populations, this paper proposes a data augmentation framework integrating Variance Risk Extrapolation (VREx) and Mixup to learn domain-invariant representations and improve cross-center generalization for COVID-19 infection classification. VREx constrains the variance of empirical risks across source domains to suppress center-specific biases, while Mixup enhances model linearity, robustness, and noise resilience via input-label interpolation. Evaluated on four independent clinical centers, the method achieves an average macro-F1 score of 0.96—significantly outperforming baseline approaches—and demonstrates strong stability and generalizability. This work establishes an interpretable, deployment-friendly regularization paradigm for domain generalization in medical imaging.
📝 Abstract
We present our solution for the Multi-Source COVID-19 Detection Challenge, which aims to classify chest CT scans into COVID and Non-COVID categories across data collected from four distinct hospitals and medical centers. A major challenge in this task lies in the domain shift caused by variations in imaging protocols, scanners, and patient populations across institutions. To enhance the cross-domain generalization of our model, we incorporate Variance Risk Extrapolation (VREx) into the training process. VREx encourages the model to maintain consistent performance across multiple source domains by explicitly minimizing the variance of empirical risks across environments. This regularization strategy reduces overfitting to center-specific features and promotes learning of domain-invariant representations. We further apply Mixup data augmentation to improve generalization and robustness. Mixup interpolates both the inputs and labels of randomly selected pairs of training samples, encouraging the model to behave linearly between examples and enhancing its resilience to noise and limited data. Our method achieves an average macro F1 score of 0.96 across the four sources on the validation set, demonstrating strong generalization.