DALPHIN: Benchmarking Digital Pathology AI Copilots Against Pathologists on an Open Multicentric Dataset

📅 2026-05-05
📈 Citations: 0
Influential: 0
📄 PDF

career value

233K/year
🤖 AI Summary
This study addresses the lack of open, multicenter benchmarks for evaluating the real-world efficacy of AI assistants in routine digital pathology diagnostics. To this end, the authors introduce DALPHIN, the first open multicenter visual question answering (VQA) benchmark, encompassing 1,236 whole-slide images across 130 diseases and 14 subspecialties from six countries, with performance benchmarked by 31 international pathologists. Employing a blinded ground-truth access protocol, the study systematically evaluates GPT-5, Gemini 2.5 Pro, and the authors’ PathChat+ model under both independent and sequential answering paradigms. PathChat+ achieves expert-level performance in four out of six tasks, significantly outperforming competing models. The complete dataset and evaluation platform are publicly released to establish a reliable benchmark for future pathology AI copilot research.
📝 Abstract
Foundation models with visual question answering capabilities for digital pathology are emerging. Such unprecedented technology requires independent benchmarking to assess its potential in assisting pathologists in routine diagnostics. We created DALPHIN, the first multicentric open benchmark for pathology AI copilots, comprising 1236 images from 300 cases, spanning 130 rare to common diagnoses, 6 countries, and 14 subspecialties. The DALPHIN design and dataset are introduced alongside a human performance benchmark of 31 pathologists from 10 countries with varying expertise. We report results for two general-purpose (GPT-5, Gemini 2.5 Pro) and one pathology-specific copilot (PathChat+) for sequential and independent answer generation. We observed no statistically significant difference from expert-level performance in four of six tasks for PathChat, 2/6 tasks for Gemini, and 1/6 tasks for GPT. DALPHIN is publicly released with sequestered, indirectly accessible ground truth to foster robust and enduring benchmarking. Data, methods, and the evaluation platform are accessible through dalphin.grand-challenge.org.
Problem

Research questions and friction points this paper is trying to address.

digital pathology
AI copilot
benchmarking
visual question answering
pathologist performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

digital pathology
AI copilot
multicentric benchmark
visual question answering
foundation models
🔎 Similar Papers
No similar papers found.
C
Carlijn Lems
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
S
Sander Moonemans
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
N
Natálie Klubíčková
Biopticka Laboratory Ltd., Pilsen, Czech Republic; Department of Pathology, Faculty of Medicine in Pilsen, Charles University, Pilsen, Czech Republic
Biagio Brattoli
Biagio Brattoli
Research scientist at Lunit, previously AWS. PhD at Heidelberg University
Computer VisionDeep learningMachine learningArtificial Intelligence
T
Taebum Lee
Lunit Inc., Seoul, South Korea
S
Seokhwi Kim
Ajou University School of Medicine, Suwon, South Korea
Veronica Vilaplana
Veronica Vilaplana
Associate professor at Universitat Politècnica de Catalunya (UPC)
Computer VisionImage processingDeep LearningMachine LearningMedical Imaging
L
Laura Pons
Department of Pathology, Hospital Universitari Germans Trias i Pujol, Badalona, Spain
S
Sapir Hochman
Department of Pathology, Hospital Universitari Germans Trias i Pujol, Badalona, Spain
M
Mauricio Eduardo Suárez-Franck
Department of Pathology, Hospital Universitari Germans Trias i Pujol, Badalona, Spain; Faculty of Medicine and Health Sciences, Universitat Autonoma de Barcelona, Barcelona, Spain
P
Pedro Luis Fernandez
Department of Pathology, Hospital Universitari Germans Trias i Pujol, Badalona, Spain; Faculty of Medicine and Health Sciences, Universitat Autonoma de Barcelona, Barcelona, Spain
J
Julius Drachneris
Vilnius University and National Centre of Pathology, Vilnius, Lithuania
D
Donatas Petroska
Vilnius University and National Centre of Pathology, Vilnius, Lithuania
R
Renaldas Augulis
Vilnius University and National Centre of Pathology, Vilnius, Lithuania
Arvydas Laurinavicius
Arvydas Laurinavicius
Vilnius University
Digital PathologyKidney PathologyPathology Informatics
D
Domingos Oliveira
Research & Development Unit, IMP Diagnostics, Porto, Portugal
D
Diana Montezuma
Research & Development Unit, IMP Diagnostics, Porto, Portugal
A
Anouk B. Bouwmeester
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
Dominique van Midden
Dominique van Midden
Resident Pathology, Radboud University Medical Center
HistopathologyRenal pathologyDermatopathologyTransplant pathologyComputational pathology
A
Anne-Marie Vos
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
S
Shoko Vos
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
J
Jolique van Ipenburg
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
M
Maschenka Balkenhol
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands; Canisius Wilhelmina Ziekenhuis, Nijmegen, The Netherlands
K
Koen Winkler
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
I
Iris Nagtegaal
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands