BioVFM-21M: Benchmarking and Scaling Self-Supervised Vision Foundation Models for Biomedical Image Analysis

📅 2025-05-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Scalability of self-supervised vision foundation models (VFMs) in medical image analysis remains poorly understood due to fragmented benchmarks and insufficient analysis of scaling factors. Method: We propose a systematic scaling law analysis framework and introduce BioVFM-21M—a large-scale, multimodal, multi-anatomic benchmark comprising 21 million medical images. Leveraging this resource, we quantitatively characterize the synergistic effects of model size, data volume, imaging modality, and algorithmic design on performance. Based on these insights, we develop BioVFM, the first biomedical VFM built upon a Vision Transformer (ViT) architecture, integrating masked autoencoding (MAE), SimCLR, and cross-modal self-supervised pretraining. Contribution/Results: BioVFM achieves state-of-the-art performance across 12 diverse downstream tasks. Our analysis rigorously validates scaling effectiveness and quantifies critical constraints—task specificity, data diversity, and computational efficiency—on performance gains, establishing foundational principles for scalable biomedical VFMs.

Technology Category

Application Category

📝 Abstract
Scaling up model and data size have demonstrated impressive performance improvement over a wide range of tasks. Despite extensive studies on scaling behaviors for general-purpose tasks, medical images exhibit substantial differences from natural data. It remains unclear the key factors in developing medical vision foundation models at scale due to the absence of an extensive understanding of scaling behavior in the medical domain. In this paper, we explored the scaling behavior across model sizes, training algorithms, data sizes, and imaging modalities in developing scalable medical vision foundation models by self-supervised learning. To support scalable pretraining, we introduce BioVFM-21M, a large-scale biomedical image dataset encompassing a wide range of biomedical image modalities and anatomies. We observed that scaling up does provide benefits but varies across tasks. Additional analysis reveals several factors correlated with scaling benefits. Finally, we propose BioVFM, a large-scale medical vision foundation model pretrained on 21 million biomedical images, which outperforms the previous state-of-the-art foundation models across 12 medical benchmarks. Our results highlight that while scaling up is beneficial for pursuing better performance, task characteristics, data diversity, pretraining methods, and computational efficiency remain critical considerations for developing scalable medical foundation models.
Problem

Research questions and friction points this paper is trying to address.

Understanding scaling behavior in medical vision foundation models
Developing large-scale biomedical image dataset for pretraining
Evaluating performance of scalable medical foundation models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scaling model and data size for medical images
Introducing BioVFM-21M large-scale biomedical dataset
Developing BioVFM foundation model via self-supervised learning
🔎 Similar Papers
No similar papers found.
J
Jiarun Liu
Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China; Pengcheng Laboratory, Shenzhen, China; University of Chinese Academy of Sciences, Beijing, China
Hong-Yu Zhou
Hong-Yu Zhou
Assistant Professor of Biomedical Engineering, Tsinghua University. Past: Harvard Medical School.
AI for HealthcareAI for MedicineBiomedical AI
W
Weijian Huang
Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China; Pengcheng Laboratory, Shenzhen, China; University of Chinese Academy of Sciences, Beijing, China
H
Hao Yang
Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China; Pengcheng Laboratory, Shenzhen, China; University of Chinese Academy of Sciences, Beijing, China
D
Dongning Song
Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China; Pengcheng Laboratory, Shenzhen, China; University of Chinese Academy of Sciences, Beijing, China
Tao Tan
Tao Tan
FCA MPU
Medical Imaging AI
Y
Yong Liang
University of Chinese Academy of Sciences, Beijing, China
S
Shanshan Wang
Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China