Halfway to 3D: Ensembling 2.5D and 3D Models for Robust COVID-19 CT Diagnosis

📅 2026-03-16

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

This work addresses the challenge of robust COVID-19 detection and multi-class pulmonary disease classification in multi-source chest CT data by proposing a novel fusion approach that integrates 2.5D and 3D deep learning models. Specifically, it innovatively combines a pre-trained 2.5D DINOv3 Vision Transformer—processing multi-view slices—with a 3D ResNet-18 pretrained via Variance Risk Extrapolation (VREx) and supervised contrastive learning to handle volumetric data. Slice-level and volume-level features are effectively fused through logit-level ensemble. Evaluated on the PHAROS-AIF-MIH benchmark, the method achieves 94.48% accuracy (Macro F1: 0.9426) in binary classification and 79.35% accuracy (Macro F1: 0.7497) in multi-class classification, significantly enhancing diagnostic performance and generalization capability.

Technology Category

Application Category

📝 Abstract

We propose a deep learning framework for COVID-19 detection and disease classification from chest CT scans that integrates both 2.5D and 3D representations to capture complementary slice-level and volumetric information. The 2.5D branch processes multi-view CT slices (axial, coronal, sagittal) using a DINOv3 vision transformer to extract robust visual features, while the 3D branch employs a ResNet-18 architecture to model volumetric context and is pretrained with Variance Risk Extrapolation (VREx) followed by supervised contrastive learning to improve cross-source robustness. Predictions from both branches are combined through logit-level ensemble inference. Experiments on the PHAROS-AIF-MIH benchmark demonstrate the effectiveness of the proposed approach: for binary COVID-19 detection, the ensemble achieves 94.48% accuracy and a 0.9426 Macro F1-score, outperforming both individual models, while for multi-class disease classification the 2.5D DINOv3 model achieves the best performance with 79.35% accuracy and a 0.7497 Macro F1-score. These results highlight the benefit of combining pretrained slice-based representations with volumetric modeling for robust multi-source medical imaging analysis. Code is available at https://github.com/HySonLab/PHAROS-AIF-MIH

Problem

Research questions and friction points this paper is trying to address.

COVID-19 diagnosis

CT scan

multi-source robustness

medical image classification

3D medical imaging

Innovation

Methods, ideas, or system contributions that make the work stand out.

2.5D-3D ensemble

DINOv3 vision transformer

Variance Risk Extrapolation (VREx)