🤖 AI Summary
This study addresses key challenges in CT contrast-phase classification—namely, high computational cost of 3D models, strong dependence on labeled data, and poor cross-center generalizability—by proposing the first 2D vision foundation model tailored for CT phase identification. Methodologically, it leverages a DeepLesion-pretrained 2D Vision Transformer (ViT) to extract robust single-slice CT embeddings, coupled with a lightweight classification head, and validates performance on multi-center datasets VinDr and WAW-TACE. It is the first systematic demonstration that 2D foundation models significantly outperform 3D CNNs, ResNet3D, and SlowFast in cross-domain robustness, training efficiency, and labeling efficiency. Results show F1 scores of 99.2%/94.2%/93.1% for non-contrast/arterial/venous phases on VinDr, and AUROC of 91.0%/85.6% for non-contrast/arterial phases on WAW-TACE—surpassing all 3D baselines while accelerating training by 3.2×. This work establishes a new paradigm for low-cost, highly generalizable foundation models in medical imaging.
📝 Abstract
Purpose: The purpose of this study is to harness the efficiency of a 2D foundation model to develop a robust phase classifier that is resilient to domain shifts. Materials and Methods: This retrospective study utilized three public datasets from separate institutions. A 2D foundation model was trained on the DeepLesion dataset (mean age: 51.2, s.d.: 17.6; 2398 males) to generate embeddings from 2D CT slices for downstream contrast phase classification. The classifier was trained on the VinDr Multiphase dataset and externally validated on the WAW-TACE dataset. The 2D model was also compared to three 3D supervised models. Results: On the VinDr dataset (146 male, 63 female, 56 unidentified), the model achieved near-perfect AUROC scores and F1 scores of 99.2%, 94.2%, and 93.1% for non-contrast, arterial, and venous phases, respectively. The `Other' category scored lower (F1: 73.4%) due to combining multiple contrast phases into one class. On the WAW-TACE dataset (mean age: 66.1, s.d.: 10.0; 185 males), the model showed strong performance with AUROCs of 91.0% and 85.6%, and F1 scores of 87.3% and 74.1% for non-contrast and arterial phases. Venous phase performance was lower, with AUROC and F1 scores of 81.7% and 70.2% respectively, due to label mismatches. Compared to 3D supervised models, the approach trained faster, performed as well or better, and showed greater robustness to domain shifts. Conclusion: The robustness of the 2D Foundation model may be potentially useful for automation of hanging protocols and data orchestration for clinical deployment of AI algorithms.