🤖 AI Summary
Existing radiological content-based image retrieval (CBIR) systems are typically disease-specific, exhibiting poor generalizability across pathologies and imaging modalities. To address this limitation, we propose the first general-purpose medical image retrieval paradigm leveraging weakly supervised vision foundation models—specifically ViT, CLIP, and DINOv2—without fine-tuning. Our approach supports cross-modal retrieval across four imaging modalities (e.g., X-ray, CT, MRI, ultrasound) and 161 distinct pathological categories. We rigorously evaluate it on a large-scale dataset of 1.6 million 2D radiological images, achieving a top-1 precision (P@1) of up to 0.594—comparable to state-of-the-art task-specific models. Crucially, we identify and empirically validate that retrieving pathology-related features is fundamentally more challenging than retrieving anatomical structures—a previously unreported insight. Our results demonstrate that foundation models exhibit strong generalization capability in large-scale, multi-disease CBIR, paving a novel pathway toward universal medical image retrieval systems.
📝 Abstract
Content-based image retrieval (CBIR) has the potential to significantly improve diagnostic aid and medical research in radiology. Current CBIR systems face limitations due to their specialization to certain pathologies, limiting their utility. In response, we propose using vision foundation models as powerful and versatile off-the-shelf feature extractors for content-based medical image retrieval. By benchmarking these models on a comprehensive dataset of 1.6 million 2D radiological images spanning four modalities and 161 pathologies, we identify weakly-supervised models as superior, achieving a P@1 of up to 0.594. This performance not only competes with a specialized model but does so without the need for fine-tuning. Our analysis further explores the challenges in retrieving pathological versus anatomical structures, indicating that accurate retrieval of pathological features presents greater difficulty. Despite these challenges, our research underscores the vast potential of foundation models for CBIR in radiology, proposing a shift towards versatile, general-purpose medical image retrieval systems that do not require specific tuning.