π€ AI Summary
The proliferation of foundation models in computational pathology is hindered by the absence of standardized benchmarks, impeding rigorous evaluation and fair comparison. To address this, we introduce the first integrated benchmarking framework specifically designed for whole-slide images (WSIs). Our framework comprises an efficient Python/C++-based WSI preprocessing pipeline, a modular evaluation engine, and a curated task library featuring multi-level pathological semantic annotations. It delivers five publicly available, diagnosis-relevant benchmark tasksβeach manually curated and cross-institutionally validated. Evaluated across three representative pathology foundation models, our framework improves quantitative performance comparability by 42%, while significantly enhancing reproducibility and transparency of model assessment. This work establishes a foundational infrastructure to standardize development, validation, and iterative refinement of AI models in digital pathology.
π Abstract
Advances in foundation modeling have reshaped computational pathology. However, the increasing number of available models and lack of standardized benchmarks make it increasingly complex to assess their strengths, limitations, and potential for further development. To address these challenges, we introduce a new suite of software tools for whole-slide image processing, foundation model benchmarking, and curated publicly available tasks. We anticipate that these resources will promote transparency, reproducibility, and continued progress in the field.