Revisiting 2D Foundation Models for Scalable 3D Medical Image Classification

📅 2025-12-14

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

3D medical image classification suffers from data scarcity-induced distributional bias, low adaptation efficiency, and narrow task coverage. Method: We propose AnyMC3D—a unified framework that freezes general-purpose 2D vision foundation models (e.g., ViT) as shared backbones and introduces lightweight task-specific adapters (~1M parameters each), enabling multi-view fusion, pixel-level auxiliary supervision, and Grad-CAM-based explainable heatmap generation. Contribution/Results: We theoretically and empirically demonstrate—for the first time—that carefully adapted 2D foundations can outperform dedicated 3D architectures. We introduce the first comprehensive 3D classification benchmark spanning 12 diverse tasks across pathologies, anatomies, and imaging modalities. AnyMC3D establishes a scalable, unified multi-task paradigm, eliminating the “one-task-one-model” approach. It achieves state-of-the-art performance on all 12 tasks, wins the VLM3D challenge, accelerates inference by 3.2×, and reduces parameter count by 98%.

Technology Category

Application Category

📝 Abstract

3D medical image classification is essential for modern clinical workflows. Medical foundation models (FMs) have emerged as a promising approach for scaling to new tasks, yet current research suffers from three critical pitfalls: data-regime bias, suboptimal adaptation, and insufficient task coverage. In this paper, we address these pitfalls and introduce AnyMC3D, a scalable 3D classifier adapted from 2D FMs. Our method scales efficiently to new tasks by adding only lightweight plugins (about 1M parameters per task) on top of a single frozen backbone. This versatile framework also supports multi-view inputs, auxiliary pixel-level supervision, and interpretable heatmap generation. We establish a comprehensive benchmark of 12 tasks covering diverse pathologies, anatomies, and modalities, and systematically analyze state-of-the-art 3D classification techniques. Our analysis reveals key insights: (1) effective adaptation is essential to unlock FM potential, (2) general-purpose FMs can match medical-specific FMs if properly adapted, and (3) 2D-based methods surpass 3D architectures for 3D classification. For the first time, we demonstrate the feasibility of achieving state-of-the-art performance across diverse applications using a single scalable framework (including 1st place in the VLM3D challenge), eliminating the need for separate task-specific models.

Problem

Research questions and friction points this paper is trying to address.

Addresses data-regime bias, suboptimal adaptation, and insufficient task coverage in 3D medical image classification.

Introduces a scalable 3D classifier adapted from 2D foundation models to efficiently scale to new tasks.

Demonstrates achieving state-of-the-art performance across diverse applications using a single scalable framework.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapts 2D foundation models into scalable 3D classifier

Uses lightweight plugins on a frozen backbone for efficiency

Supports multi-view inputs and interpretable heatmap generation

🔎 Similar Papers

No similar papers found.