π€ AI Summary
This work addresses the current lack of systematic and reproducible evaluation of foundation models in computational pathology for dense prediction tasks. To this end, we introduce PFM-DenseBench, a large-scale benchmark encompassing 17 pathology foundation models and 18 publicly available segmentation datasets, which enables a comprehensive comparison of diverse adaptation and fine-tuning strategies under a unified protocol. Employing standardized training pipelines, containerized deployment, and dataset cards, our framework ensures reproducibility and rigor. The study reveals the conditions under which specific models and strategies succeed or fail across heterogeneous pathological data, providing actionable insights for practical model selection and optimization. All tools, configurations, and evaluation protocols are openly released to support community-driven advancement in the field.
π Abstract
Pathology foundation models (PFMs) have rapidly advanced and are becoming a common backbone for downstream clinical tasks, offering strong transferability across tissues and institutions. However, for dense prediction (e.g., segmentation), practical deployment still lacks a clear, reproducible understanding of how different PFMs behave across datasets and how adaptation choices affect performance and stability. We present PFM-DenseBench, a large-scale benchmark for dense pathology prediction, evaluating 17 PFMs across 18 public segmentation datasets. Under a unified protocol, we systematically assess PFMs with multiple adaptation and fine-tuning strategies, and derive insightful, practice-oriented findings on when and why different PFMs and tuning choices succeed or fail across heterogeneous datasets. We release containers, configs, and dataset cards to enable reproducible evaluation and informed PFM selection for real-world dense pathology tasks. Project Website: https://m4a1tastegood.github.io/PFM-DenseBench