EvalBlocks: A Modular Pipeline for Rapidly Evaluating Foundation Models in Medical Imaging

📅 2026-01-07
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of efficient and reproducible standardized evaluation protocols for foundational models in medical imaging, which hinders rapid research iteration. To this end, we propose the first modular evaluation framework tailored to this domain, built upon Snakemake to enable plug-and-play pipelines that flexibly integrate diverse datasets, models, and evaluation strategies. The framework incorporates centralized experiment tracking, caching mechanisms, and parallelized computation to enhance efficiency and reproducibility. Experiments across five state-of-the-art foundational models and three medical image classification tasks demonstrate that our framework substantially improves evaluation throughput and reproducibility, offering a scalable infrastructure for advancing research on foundational models in medical imaging.

Technology Category

Application Category

📝 Abstract
Developing foundation models in medical imaging requires continuous monitoring of downstream performance. Researchers are burdened with tracking numerous experiments, design choices, and their effects on performance, often relying on ad-hoc, manual workflows that are inherently slow and error-prone. We introduce EvalBlocks, a modular, plug-and-play framework for efficient evaluation of foundation models during development. Built on Snakemake, EvalBlocks supports seamless integration of new datasets, foundation models, aggregation methods, and evaluation strategies. All experiments and results are tracked centrally and are reproducible with a single command, while efficient caching and parallel execution enable scalable use on shared compute infrastructure. Demonstrated on five state-of-the-art foundation models and three medical imaging classification tasks, EvalBlocks streamlines model evaluation, enabling researchers to iterate faster and focus on model innovation rather than evaluation logistics. The framework is released as open source software at https://github.com/DIAGNijmegen/eval-blocks.
Problem

Research questions and friction points this paper is trying to address.

foundation models
medical imaging
model evaluation
reproducibility
workflow automation
Innovation

Methods, ideas, or system contributions that make the work stand out.

modular evaluation
foundation models
medical imaging
reproducible research
Snakemake
🔎 Similar Papers
No similar papers found.
J
Jan Tagscherer
Diagnostic Image Analysis Group, Radboud University Medical Center, Nijmegen, The Netherlands
S
Sarah de Boer
Diagnostic Image Analysis Group, Radboud University Medical Center, Nijmegen, The Netherlands
L
Lena Philipp
Diagnostic Image Analysis Group, Radboud University Medical Center, Nijmegen, The Netherlands
F
Fennie van der Graaf
Diagnostic Image Analysis Group, Radboud University Medical Center, Nijmegen, The Netherlands
D
Dré Peeters
Diagnostic Image Analysis Group, Radboud University Medical Center, Nijmegen, The Netherlands
J
Joeran Bosma
Diagnostic Image Analysis Group, Radboud University Medical Center, Nijmegen, The Netherlands
L
Lars Leijten
Diagnostic Image Analysis Group, Radboud University Medical Center, Nijmegen, The Netherlands
B
Bogdan Obreja
Diagnostic Image Analysis Group, Radboud University Medical Center, Nijmegen, The Netherlands
E
Ewoud Smit
Diagnostic Image Analysis Group, Radboud University Medical Center, Nijmegen, The Netherlands
Alessa Hering
Alessa Hering
Radboud University Medical Center
Deep LearningImage RegistrationTumor Follow-UpLLM