🤖 AI Summary
Medical imaging analysis faces dual challenges of scarce annotated data and poor cross-modal generalization. To address these, this work systematically reviews over 150 studies, presenting the first unified technical taxonomy of foundation models (FMs) across modalities—pathology, radiology, and ophthalmology—and specialties. We comprehensively analyze architectures (e.g., ViT, MAE, SimCLR) and self-supervised paradigms (e.g., masked modeling, contrastive learning), and evaluate lightweight adaptation strategies including prompt tuning, adapters, and LoRA. We propose a generalizable evaluation framework and a challenge taxonomy, distilling three universal design principles and modality-specific implementation patterns. Key bottlenecks—data privacy, annotation efficiency, and clinical interpretability—are explicitly identified. Our synthesis provides both theoretical foundations and a concrete technical roadmap toward trustworthy, robust, and deployable next-generation medical FMs.
📝 Abstract
Foundation models (FMs) are changing the way medical images are analyzed by learning from large collections of unlabeled data. Instead of relying on manually annotated examples, FMs are pre-trained to learn general-purpose visual features that can later be adapted to specific clinical tasks with little additional supervision. In this review, we examine how FMs are being developed and applied in pathology, radiology, and ophthalmology, drawing on evidence from over 150 studies. We explain the core components of FM pipelines, including model architectures, self-supervised learning methods, and strategies for downstream adaptation. We also review how FMs are being used in each imaging domain and compare design choices across applications. Finally, we discuss key challenges and open questions to guide future research.