BioVFM-21M: Benchmarking and Scaling Self-Supervised Vision Foundation Models for Biomedical Image Analysis

📅 2025-05-14

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Scalability of self-supervised vision foundation models (VFMs) in medical image analysis remains poorly understood due to fragmented benchmarks and insufficient analysis of scaling factors. Method: We propose a systematic scaling law analysis framework and introduce BioVFM-21M—a large-scale, multimodal, multi-anatomic benchmark comprising 21 million medical images. Leveraging this resource, we quantitatively characterize the synergistic effects of model size, data volume, imaging modality, and algorithmic design on performance. Based on these insights, we develop BioVFM, the first biomedical VFM built upon a Vision Transformer (ViT) architecture, integrating masked autoencoding (MAE), SimCLR, and cross-modal self-supervised pretraining. Contribution/Results: BioVFM achieves state-of-the-art performance across 12 diverse downstream tasks. Our analysis rigorously validates scaling effectiveness and quantifies critical constraints—task specificity, data diversity, and computational efficiency—on performance gains, establishing foundational principles for scalable biomedical VFMs.

Technology Category

Application Category

📝 Abstract

Scaling up model and data size have demonstrated impressive performance improvement over a wide range of tasks. Despite extensive studies on scaling behaviors for general-purpose tasks, medical images exhibit substantial differences from natural data. It remains unclear the key factors in developing medical vision foundation models at scale due to the absence of an extensive understanding of scaling behavior in the medical domain. In this paper, we explored the scaling behavior across model sizes, training algorithms, data sizes, and imaging modalities in developing scalable medical vision foundation models by self-supervised learning. To support scalable pretraining, we introduce BioVFM-21M, a large-scale biomedical image dataset encompassing a wide range of biomedical image modalities and anatomies. We observed that scaling up does provide benefits but varies across tasks. Additional analysis reveals several factors correlated with scaling benefits. Finally, we propose BioVFM, a large-scale medical vision foundation model pretrained on 21 million biomedical images, which outperforms the previous state-of-the-art foundation models across 12 medical benchmarks. Our results highlight that while scaling up is beneficial for pursuing better performance, task characteristics, data diversity, pretraining methods, and computational efficiency remain critical considerations for developing scalable medical foundation models.

Problem

Research questions and friction points this paper is trying to address.

Understanding scaling behavior in medical vision foundation models

Developing large-scale biomedical image dataset for pretraining

Evaluating performance of scalable medical foundation models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Scaling model and data size for medical images

Introducing BioVFM-21M large-scale biomedical dataset

Developing BioVFM foundation model via self-supervised learning

🔎 Similar Papers

No similar papers found.

Authors to Follow