Designing UNICORN: a Unified Benchmark for Imaging in Computational Pathology, Radiology, and Natural Language

📅 2026-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the absence of a unified, publicly available, and reproducible evaluation framework for medical foundation models, which hinders systematic assessment of their generalization across tasks, modalities, and anatomical regions. To this end, we propose UNICORN, a benchmark featuring a two-stage decoupled architecture that evaluates model representation quality under a standardized few-shot adaptation protocol and clinically relevant yet isolated test sets. UNICORN integrates multimodal data—including pathology, radiology imaging, and clinical text—spanning multiple tasks, institutions, eight anatomical regions, four imaging modalities, and over 2,400 patients. It introduces the UNICORN Score, a novel composite metric, and provides an open platform with a public leaderboard to support transparent, reproducible, and comprehensive evaluation of medical foundation models.

Technology Category

Application Category

📝 Abstract
Medical foundation models show promise to learn broadly generalizable features from large, diverse datasets. This could be the base for reliable cross-modality generalization and rapid adaptation to new, task-specific goals, with only a few task-specific examples. Yet, evidence for this is limited by the lack of public, standardized, and reproducible evaluation frameworks, as existing public benchmarks are often fragmented across task-, organ-, or modality-specific settings, limiting assessment of cross-task generalization. We introduce UNICORN, a public benchmark designed to systematically evaluate medical foundation models under a unified protocol. To isolate representation quality, we built the benchmark on a novel two-step framework that decouples model inference from task-specific evaluation based on standardized few-shot adaptation. As a central design choice, we constructed indirectly accessible sequestered test sets derived from clinically relevant cohorts, along with standardized evaluation code and a submission interface on an open benchmarking platform. Performance is aggregated into a single UNICORN Score, a new metric that we introduce to support direct comparison of foundation models across diverse medical domains, modalities, and task types. The UNICORN test dataset includes data from more than 2,400 patients, including over 3,700 vision cases and over 2,400 clinical reports collected from 17 institutions across eight countries. The benchmark spans eight anatomical regions and four imaging modalities. Both task-specific and aggregated leaderboards enable accessible, standardized, and reproducible evaluation. By standardizing multi-task, multi-modality assessment, UNICORN establishes a foundation for reproducible benchmarking of medical foundation models. Data, baseline methods, and the evaluation platform are publicly available via unicorn.grand-challenge.org.
Problem

Research questions and friction points this paper is trying to address.

medical foundation models
cross-modality generalization
benchmark
computational pathology
few-shot adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

medical foundation models
unified benchmark
few-shot adaptation
cross-modality generalization
reproducible evaluation
🔎 Similar Papers
No similar papers found.
M
Michelle Stegeman
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
L
Lena Philipp
Department of Medical Imaging, Radboud University Medical Center, Nijmegen, The Netherlands
F
Fennie van der Graaf
Department of Medical Imaging, Radboud University Medical Center, Nijmegen, The Netherlands
Marina D'Amato
Marina D'Amato
PhD candidate, Radboudumc
Medical ImagingDeep LearningComputational PathologyComputer Vision
Clément Grisi
Clément Grisi
PhD Candidate, Radboudumc
computer visiondeep learningcomputational pathology
L
Luc Builtjes
Department of Medical Imaging, Radboud University Medical Center, Nijmegen, The Netherlands
J
Joeran S. Bosma
Department of Medical Imaging, Radboud University Medical Center, Nijmegen, The Netherlands
Judith Lefkes
Judith Lefkes
PhD Candidate Computational Pathology Group, Radboudumc
Machine LearningComputational PathologyComputational Neuroscience
R
Rianne A. Weber
Department of Medical Imaging, Radboud University Medical Center, Nijmegen, The Netherlands
J
James A. Meakin
Department of Medical Imaging, Radboud University Medical Center, Nijmegen, The Netherlands
T
Thomas Koopman
Informatics Institute, Faculty of Science, University of Amsterdam, Amsterdam, The Netherlands
A
Anne Mickan
Department of Medical Imaging, Radboud University Medical Center, Nijmegen, The Netherlands
Mathias Prokop
Mathias Prokop
Professor of Radiology, Radboudumc
Computed tomographycomputer aided diagnosislung cancerstroke
E
Ewoud J. Smit
Department of Medical Imaging, Radboud University Medical Center, Nijmegen, The Netherlands
Geert Litjens
Geert Litjens
Radboud University Medical Center
digital pathologycomputer-aided detectionMRIprostate cancerbreast cancer
Jeroen van der Laak
Jeroen van der Laak
Radboud University Medical Center
Digital PathologyComputational PathologyDeep LearningImage Analysis
Bram van Ginneken
Bram van Ginneken
Professor of Medical Image Analysis, Radboud University
Medical Image AnalysisMedical ImagingDeep LearningComputer-Aided Diagnosis
M
Maarten de Rooij
Department of Medical Imaging, Radboud University Medical Center, Nijmegen, The Netherlands
Henkjan Huisman
Henkjan Huisman
Professor Medical Imaging AI, Radboud University Medical Centre Nijmegen and NTNU Norway
Artificial intelligencePelvic/Abdominal CancerMRI/Ultrasound
Colin Jacobs
Colin Jacobs
Associate Professor in AI for Thoracic Oncology, Radboudumc, Nijmegen, The Netherlands
Medical Image AnalysisMachine LearningComputer-aided DiagnosisDeep LearningMedical Imaging
Francesco Ciompi
Francesco Ciompi
Radboud University Medical Center, Nijmegen
Deep LearningComputational PathologyMedical Image AnalysisComputer Aided Diagnosis
Alessa Hering
Alessa Hering
Radboud University Medical Center
Deep LearningImage RegistrationTumor Follow-UpLLM