PathBench: A comprehensive comparison benchmark for pathology foundation models towards precision oncology

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Clinical translation of pathology foundation models is hindered by cancer-type specificity, evaluation data leakage risks, and the absence of standardized benchmarks. To address these challenges, we introduce the first comprehensive benchmark for precision oncology—PathoBench—spanning the full clinical workflow from diagnosis to prognosis. It comprises 15,888 multi-institutional, private whole-slide images (WSIs) from 10 hospitals (8,549 patients) and 64 diverse tasks, with strict pretraining–evaluation data isolation. We propose a standardized, multi-cancer, end-to-end, leakage-resistant evaluation framework, integrating an automated real-time leaderboard and a multi-task assessment pipeline. Systematic evaluation of 19 state-of-the-art models identifies Virchow2 and H-Optimus-1 as top-performing across metrics. PathoBench provides a reproducible, clinically grounded evaluation platform for model development and objective, evidence-based model selection for clinical deployment.

Technology Category

Application Category

📝 Abstract
The emergence of pathology foundation models has revolutionized computational histopathology, enabling highly accurate, generalized whole-slide image analysis for improved cancer diagnosis, and prognosis assessment. While these models show remarkable potential across cancer diagnostics and prognostics, their clinical translation faces critical challenges including variability in optimal model across cancer types, potential data leakage in evaluation, and lack of standardized benchmarks. Without rigorous, unbiased evaluation, even the most advanced PFMs risk remaining confined to research settings, delaying their life-saving applications. Existing benchmarking efforts remain limited by narrow cancer-type focus, potential pretraining data overlaps, or incomplete task coverage. We present PathBench, the first comprehensive benchmark addressing these gaps through: multi-center in-hourse datasets spanning common cancers with rigorous leakage prevention, evaluation across the full clinical spectrum from diagnosis to prognosis, and an automated leaderboard system for continuous model assessment. Our framework incorporates large-scale data, enabling objective comparison of PFMs while reflecting real-world clinical complexity. All evaluation data comes from private medical providers, with strict exclusion of any pretraining usage to avoid data leakage risks. We have collected 15,888 WSIs from 8,549 patients across 10 hospitals, encompassing over 64 diagnosis and prognosis tasks. Currently, our evaluation of 19 PFMs shows that Virchow2 and H-Optimus-1 are the most effective models overall. This work provides researchers with a robust platform for model development and offers clinicians actionable insights into PFM performance across diverse clinical scenarios, ultimately accelerating the translation of these transformative technologies into routine pathology practice.
Problem

Research questions and friction points this paper is trying to address.

Evaluating pathology foundation models' performance across diverse cancer types
Addressing data leakage risks in model evaluation and benchmarking
Standardizing benchmarks for clinical diagnosis and prognosis tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-center datasets prevent data leakage
Automated leaderboard for continuous assessment
Large-scale data enables objective comparison
🔎 Similar Papers
No similar papers found.
J
Jiabo Ma
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China
Yingxue Xu
Yingxue Xu
The Hong Kong University of Science and Technology
Multimodal LearningSurvival AnalysisComputational Pathology
Fengtao Zhou
Fengtao Zhou
Hong Kong University of Science and Technology
Multimodal LearningComputational Pathology
Y
Yihui Wang
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China
C
Cheng Jin
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China
Z
Zhengrui Guo
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China
J
Jianfeng Wu
State Key Laboratory of Holistic Integrative Management of Gastrointestinal Cancers, Department of Pathology, School of Basic Medicine and Xijing Hospital, Fourth Military Medical University, Xi’an, China
O
On Ki Tang
Department of Anatomical and Cellular Pathology, Chinese University of Hong Kong, Hong Kong, China
Huajun Zhou
Huajun Zhou
The Hong Kong University of Science and Technology
Computer VisionMedical Image Processing
X
Xi Wang
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China
L
Luyang Luo
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China; Department of Biomedical Informatics, Harvard University, Boston, USA.
Z
Zhengyu Zhang
Department of Pathology, Nanfang Hospital and School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
D
Du Cai
Department of General Surgery (Colorectal Surgery), The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China; Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China; Biomedical Innovation Center, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
Z
Zizhao Gao
State Key Laboratory of Holistic Integrative Management of Gastrointestinal Cancers, Department of Pathology, School of Basic Medicine and Xijing Hospital, Fourth Military Medical University, Xi’an, China
W
Wei Wang
Department of Pathology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
Y
Yueping Liu
Department of Pathology, The Fourth Hospital of Hebei Medical University, Shijiazhuang, China
J
Jiankun He
Department of Pathology, The Fourth Hospital of Hebei Medical University, Shijiazhuang, China
Jing Cui
Jing Cui
PhD Student, Research School of Computer Science, Australian National University
Temporal PlanningSchedulingDynamic ControllabilityArtificial Intelligence
Zhenhui Li
Zhenhui Li
the Third Affiliated Hospital of Kunming Medical University, Yunnan Cancer Hospital, Yunnan Cancer
radiomicspathomicscolorectal cancer
J
Jing Zhang
Department of Pathology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
F
Feng Gao
Department of General Surgery (Colorectal Surgery), The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China; Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China; Biomedical Innovation Center, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
X
Xiuming Zhang
Department of Pathology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
Li Liang
Li Liang
The University of Western Australia
3D Point Cloud Processing3D Semantic Scene Completion3D Semantic Scene Generation
R
Ronald Cheong Kin Chan
Department of Anatomical and Cellular Pathology, Chinese University of Hong Kong, Hong Kong, China
Z
Zhe Wang
State Key Laboratory of Holistic Integrative Management of Gastrointestinal Cancers, Department of Pathology, School of Basic Medicine and Xijing Hospital, Fourth Military Medical University, Xi’an, China; Department of Anatomical and Cellular Pathology, Chinese University of Hong Kong, Hong Kong, China
H
Hao Chen
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China; Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Hong Kong, China; Division of Life Science, Hong Kong University of Science and Technology, Hong Kong, China; State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Hong Kong, China; HKUST Shenzhen-Hong Kong Collaborative Innovation Research Institute, F