🤖 AI Summary
This work addresses the challenge of robust COVID-19 detection and multi-class pulmonary disease classification in multi-source chest CT data by proposing a novel fusion approach that integrates 2.5D and 3D deep learning models. Specifically, it innovatively combines a pre-trained 2.5D DINOv3 Vision Transformer—processing multi-view slices—with a 3D ResNet-18 pretrained via Variance Risk Extrapolation (VREx) and supervised contrastive learning to handle volumetric data. Slice-level and volume-level features are effectively fused through logit-level ensemble. Evaluated on the PHAROS-AIF-MIH benchmark, the method achieves 94.48% accuracy (Macro F1: 0.9426) in binary classification and 79.35% accuracy (Macro F1: 0.7497) in multi-class classification, significantly enhancing diagnostic performance and generalization capability.
📝 Abstract
We propose a deep learning framework for COVID-19 detection and disease classification from chest CT scans that integrates both 2.5D and 3D representations to capture complementary slice-level and volumetric information. The 2.5D branch processes multi-view CT slices (axial, coronal, sagittal) using a DINOv3 vision transformer to extract robust visual features, while the 3D branch employs a ResNet-18 architecture to model volumetric context and is pretrained with Variance Risk Extrapolation (VREx) followed by supervised contrastive learning to improve cross-source robustness. Predictions from both branches are combined through logit-level ensemble inference. Experiments on the PHAROS-AIF-MIH benchmark demonstrate the effectiveness of the proposed approach: for binary COVID-19 detection, the ensemble achieves 94.48% accuracy and a 0.9426 Macro F1-score, outperforming both individual models, while for multi-class disease classification the 2.5D DINOv3 model achieves the best performance with 79.35% accuracy and a 0.7497 Macro F1-score. These results highlight the benefit of combining pretrained slice-based representations with volumetric modeling for robust multi-source medical imaging analysis. Code is available at https://github.com/HySonLab/PHAROS-AIF-MIH