Feature Quality and Adaptability of Medical Foundation Models: A Comparative Evaluation for Radiographic Classification and Segmentation

📅 2025-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current medical foundation models (MFMs) lack systematic evaluation of feature quality and adaptability for fine-grained thoracic X-ray analysis, particularly in classification and anatomical structure segmentation. Method: We systematically benchmark eight vision foundation models—spanning medical versus general pretraining, multi-scale versus modality-aligned architectures, and text-guided versus image-supervised alignment—using linear probing, full fine-tuning, and subgroup analysis on standard radiology datasets. Results: Medical pretraining substantially improves linear probing performance but fails to eliminate the need for fine-tuning in subtle lesion segmentation. Text-image alignment is unnecessary; label-supervised or purely image-based pretraining yields superior segmentation accuracy. Multi-scale architectural design proves more decisive than cross-modal alignment. Critically, we reveal an intrinsic limitation of state-of-the-art MFMs in complex spatial localization tasks; meanwhile, supervised end-to-end models now match or surpass leading foundation models in segmentation precision—challenging prevailing assumptions about the necessity of foundation-model paradigms for medical imaging segmentation.

Technology Category

Application Category

📝 Abstract
Foundation models (FMs) promise to generalize medical imaging, but their effectiveness varies. It remains unclear how pre-training domain (medical vs. general), paradigm (e.g., text-guided), and architecture influence embedding quality, hindering the selection of optimal encoders for specific radiology tasks. To address this, we evaluate vision encoders from eight medical and general-domain FMs for chest X-ray analysis. We benchmark classification (pneumothorax, cardiomegaly) and segmentation (pneumothorax, cardiac boundary) using linear probing and fine-tuning. Our results show that domain-specific pre-training provides a significant advantage; medical FMs consistently outperformed general-domain models in linear probing, establishing superior initial feature quality. However, feature utility is highly task-dependent. Pre-trained embeddings were strong for global classification and segmenting salient anatomy (e.g., heart). In contrast, for segmenting complex, subtle pathologies (e.g., pneumothorax), all FMs performed poorly without significant fine-tuning, revealing a critical gap in localizing subtle disease. Subgroup analysis showed FMs use confounding shortcuts (e.g., chest tubes for pneumothorax) for classification, a strategy that fails for precise segmentation. We also found that expensive text-image alignment is not a prerequisite; image-only (RAD-DINO) and label-supervised (Ark+) FMs were among top performers. Notably, a supervised, end-to-end baseline remained highly competitive, matching or exceeding the best FMs on segmentation tasks. These findings show that while medical pre-training is beneficial, architectural choices (e.g., multi-scale) are critical, and pre-trained features are not universally effective, especially for complex localization tasks where supervised models remain a strong alternative.
Problem

Research questions and friction points this paper is trying to address.

Evaluating how pre-training domain and architecture affect medical foundation model performance
Assessing feature quality for chest X-ray classification and segmentation tasks
Identifying limitations in localizing subtle pathologies without extensive fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Medical foundation models outperform general-domain models
Pre-trained features are task-dependent and require fine-tuning
Image-only and label-supervised models are top performers
🔎 Similar Papers
No similar papers found.
F
Frank Li
Department of Radiology, Emory University, Atlanta, GA, USA
T
T. Dapamede
Department of Radiology, Emory University, Atlanta, GA, USA
Mohammadreza Chavoshi
Mohammadreza Chavoshi
MD, Postdoctoral Researcher, Emory University
Radiologymeta-analysisArtificial Intelligence
Young Seok Jeon
Young Seok Jeon
National University of Singapore
Bardia Khosravi
Bardia Khosravi
Radiology Resident @ Yale
RadiologyArtificial IntelligenceImaging Informatics
A
Abdulhameed Dere
Faculty of Clinical Sciences, College of Health Sciences, University of Ilorin, Ilorin, Nigeria
B
Beatrice Brown-Mulry
Department of Radiology, Emory University, Atlanta, GA, USA
R
Rohan Isaac
Department of Radiology, Emory University, Atlanta, GA, USA
A
Aawez Mansuri
Department of Radiology, Emory University, Atlanta, GA, USA
C
C. Sanyika
Department of Radiology, Emory University, Atlanta, GA, USA
J
Janice M. Newsome
Department of Radiology, Emory University, Atlanta, GA, USA
S
S. Purkayastha
Luddy School of Informatics, Computing, and Engineering, Indiana University Indianapolis, Indianapolis, IN, USA
I
I. Banerjee
Department of Radiology, Mayo Clinic, Phoenix, AZ, USA
Hari Trivedi
Hari Trivedi
Emory University
Deep LearningRadiologyMammographyAINatural Language Processing
J
J. Gichoya
Department of Radiology, Emory University, Atlanta, GA, USA