Uncovering Modality Discrepancy and Generalization Illusion for General-Purpose 3D Medical Segmentation

📅 2026-02-07

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This study addresses the overestimated generalization capability of existing 3D medical foundation models in cross-modal settings—particularly between CT and MRI—and highlights the lack of systematic evaluation under realistic multimodal clinical conditions. The authors introduce the UMD dataset, comprising approximately 675,000 paired 2D images and 12,000 annotated 3D organs from PET/CT and PET/MRI scans, and employ a within-subject controlled design to assess model robustness across structural and functional imaging modalities, treating modality as an independent variable. Their analysis reveals, for the first time, a “generalization illusion” wherein reported benchmark performance fails to translate to real-world multimodal applications: models exhibit significant performance degradation when transferring from structural to functional imaging, falling far short of true modality-agnostic capability. The findings advocate for a paradigm shift toward multimodal training and evaluation frameworks to advance genuinely universal medical foundation models.

Technology Category

Application Category

📝 Abstract

While emerging 3D medical foundation models are envisioned as versatile tools with offer general-purpose capabilities, their validation remains largely confined to regional and structural imaging, leaving a significant modality discrepancy unexplored. To provide a rigorous and objective assessment, we curate the UMD dataset comprising 490 whole-body PET/CT and 464 whole-body PET/MRI scans ($\sim$675k 2D images, $\sim$12k 3D organ annotations) and conduct a thorough and comprehensive evaluation of representative 3D segmentation foundation models. Through intra-subject controlled comparisons of paired scans, we isolate imaging modality as the primary independent variable to evaluate model robustness in real-world applications. Our evaluation reveals a stark discrepancy between literature-reported benchmarks and real-world efficacy, particularly when transitioning from structural to functional domains. Such systemic failures underscore that current 3D foundation models are far from achieving truly general-purpose status, necessitating a paradigm shift toward multi-modal training and evaluation to bridge the gap between idealized benchmarking and comprehensive clinical utility. This dataset and analysis establish a foundational cornerstone for future research to develop truly modality-agnostic medical foundation models.

Problem

Research questions and friction points this paper is trying to address.

modality discrepancy

generalization illusion

3D medical segmentation

foundation models

multi-modal evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

modality discrepancy

generalization illusion

3D medical foundation models