EvalBlocks: A Modular Pipeline for Rapidly Evaluating Foundation Models in Medical Imaging

📅 2026-01-07

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work addresses the lack of efficient and reproducible standardized evaluation protocols for foundational models in medical imaging, which hinders rapid research iteration. To this end, we propose the first modular evaluation framework tailored to this domain, built upon Snakemake to enable plug-and-play pipelines that flexibly integrate diverse datasets, models, and evaluation strategies. The framework incorporates centralized experiment tracking, caching mechanisms, and parallelized computation to enhance efficiency and reproducibility. Experiments across five state-of-the-art foundational models and three medical image classification tasks demonstrate that our framework substantially improves evaluation throughput and reproducibility, offering a scalable infrastructure for advancing research on foundational models in medical imaging.

Technology Category

Application Category

📝 Abstract

Developing foundation models in medical imaging requires continuous monitoring of downstream performance. Researchers are burdened with tracking numerous experiments, design choices, and their effects on performance, often relying on ad-hoc, manual workflows that are inherently slow and error-prone. We introduce EvalBlocks, a modular, plug-and-play framework for efficient evaluation of foundation models during development. Built on Snakemake, EvalBlocks supports seamless integration of new datasets, foundation models, aggregation methods, and evaluation strategies. All experiments and results are tracked centrally and are reproducible with a single command, while efficient caching and parallel execution enable scalable use on shared compute infrastructure. Demonstrated on five state-of-the-art foundation models and three medical imaging classification tasks, EvalBlocks streamlines model evaluation, enabling researchers to iterate faster and focus on model innovation rather than evaluation logistics. The framework is released as open source software at https://github.com/DIAGNijmegen/eval-blocks.

Problem

Research questions and friction points this paper is trying to address.

foundation models

medical imaging

model evaluation

reproducibility

workflow automation

Innovation

Methods, ideas, or system contributions that make the work stand out.

modular evaluation

foundation models

medical imaging