Feature Quality and Adaptability of Medical Foundation Models: A Comparative Evaluation for Radiographic Classification and Segmentation

📅 2025-11-12

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Current medical foundation models (MFMs) lack systematic evaluation of feature quality and adaptability for fine-grained thoracic X-ray analysis, particularly in classification and anatomical structure segmentation. Method: We systematically benchmark eight vision foundation models—spanning medical versus general pretraining, multi-scale versus modality-aligned architectures, and text-guided versus image-supervised alignment—using linear probing, full fine-tuning, and subgroup analysis on standard radiology datasets. Results: Medical pretraining substantially improves linear probing performance but fails to eliminate the need for fine-tuning in subtle lesion segmentation. Text-image alignment is unnecessary; label-supervised or purely image-based pretraining yields superior segmentation accuracy. Multi-scale architectural design proves more decisive than cross-modal alignment. Critically, we reveal an intrinsic limitation of state-of-the-art MFMs in complex spatial localization tasks; meanwhile, supervised end-to-end models now match or surpass leading foundation models in segmentation precision—challenging prevailing assumptions about the necessity of foundation-model paradigms for medical imaging segmentation.

Technology Category

Application Category

📝 Abstract

Foundation models (FMs) promise to generalize medical imaging, but their effectiveness varies. It remains unclear how pre-training domain (medical vs. general), paradigm (e.g., text-guided), and architecture influence embedding quality, hindering the selection of optimal encoders for specific radiology tasks. To address this, we evaluate vision encoders from eight medical and general-domain FMs for chest X-ray analysis. We benchmark classification (pneumothorax, cardiomegaly) and segmentation (pneumothorax, cardiac boundary) using linear probing and fine-tuning. Our results show that domain-specific pre-training provides a significant advantage; medical FMs consistently outperformed general-domain models in linear probing, establishing superior initial feature quality. However, feature utility is highly task-dependent. Pre-trained embeddings were strong for global classification and segmenting salient anatomy (e.g., heart). In contrast, for segmenting complex, subtle pathologies (e.g., pneumothorax), all FMs performed poorly without significant fine-tuning, revealing a critical gap in localizing subtle disease. Subgroup analysis showed FMs use confounding shortcuts (e.g., chest tubes for pneumothorax) for classification, a strategy that fails for precise segmentation. We also found that expensive text-image alignment is not a prerequisite; image-only (RAD-DINO) and label-supervised (Ark+) FMs were among top performers. Notably, a supervised, end-to-end baseline remained highly competitive, matching or exceeding the best FMs on segmentation tasks. These findings show that while medical pre-training is beneficial, architectural choices (e.g., multi-scale) are critical, and pre-trained features are not universally effective, especially for complex localization tasks where supervised models remain a strong alternative.

Problem

Research questions and friction points this paper is trying to address.

Evaluating how pre-training domain and architecture affect medical foundation model performance

Assessing feature quality for chest X-ray classification and segmentation tasks

Identifying limitations in localizing subtle pathologies without extensive fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Medical foundation models outperform general-domain models

Pre-trained features are task-dependent and require fine-tuning

Image-only and label-supervised models are top performers

🔎 Similar Papers

Leveraging Foundation Models for Content-Based Medical Image Retrieval in Radiology