Leveraging Foundation Models for Content-Based Medical Image Retrieval in Radiology

📅 2024-03-11

🏛️ arXiv.org

📈 Citations: 8

✨ Influential: 1

career value

216K/year

🤖 AI Summary

Existing radiological content-based image retrieval (CBIR) systems are typically disease-specific, exhibiting poor generalizability across pathologies and imaging modalities. To address this limitation, we propose the first general-purpose medical image retrieval paradigm leveraging weakly supervised vision foundation models—specifically ViT, CLIP, and DINOv2—without fine-tuning. Our approach supports cross-modal retrieval across four imaging modalities (e.g., X-ray, CT, MRI, ultrasound) and 161 distinct pathological categories. We rigorously evaluate it on a large-scale dataset of 1.6 million 2D radiological images, achieving a top-1 precision (P@1) of up to 0.594—comparable to state-of-the-art task-specific models. Crucially, we identify and empirically validate that retrieving pathology-related features is fundamentally more challenging than retrieving anatomical structures—a previously unreported insight. Our results demonstrate that foundation models exhibit strong generalization capability in large-scale, multi-disease CBIR, paving a novel pathway toward universal medical image retrieval systems.

Technology Category

Application Category

📝 Abstract

Content-based image retrieval (CBIR) has the potential to significantly improve diagnostic aid and medical research in radiology. Current CBIR systems face limitations due to their specialization to certain pathologies, limiting their utility. In response, we propose using vision foundation models as powerful and versatile off-the-shelf feature extractors for content-based medical image retrieval. By benchmarking these models on a comprehensive dataset of 1.6 million 2D radiological images spanning four modalities and 161 pathologies, we identify weakly-supervised models as superior, achieving a P@1 of up to 0.594. This performance not only competes with a specialized model but does so without the need for fine-tuning. Our analysis further explores the challenges in retrieving pathological versus anatomical structures, indicating that accurate retrieval of pathological features presents greater difficulty. Despite these challenges, our research underscores the vast potential of foundation models for CBIR in radiology, proposing a shift towards versatile, general-purpose medical image retrieval systems that do not require specific tuning.

Problem

Research questions and friction points this paper is trying to address.

Enhancing radiology diagnostics with foundation models for CBIR

Overcoming limitations of specialized CBIR systems with versatile models

Evaluating foundation models' performance on diverse radiological images

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using vision foundation models for image retrieval

Benchmarking models on 1.6M radiological images

Identifying BiomedCLIP as highly effective model

🔎 Similar Papers

RadCLIP: Enhancing Radiologic Image Analysis through Contrastive Language-Image Pre-training