🤖 AI Summary
This study addresses reproducibility challenges in training histopathology foundation models—arising from software stochasticity, hardware nondeterminism, and inconsistent hyperparameter reporting—by systematically investigating the impact of hyperparameters and data augmentation strategies on model stability. Leveraging the CLIP architecture, we pretrain on QUILT-1M and conduct large-scale ablation studies across three downstream benchmarks: PatchCamelyon, LC25000-Lung, and LC25000-Colon. Key findings include enhanced training consistency with RandomResizedCrop scale ratios of 0.7–0.8, disabling local loss in distributed training, and learning rates below 5.0×10⁻⁵; LC25000-Colon emerges as the most reproducible benchmark. We propose the first reproducibility-oriented best-practice guide specifically for digital pathology modeling, providing methodological foundations for robust development and evaluation of pathology AI systems.
📝 Abstract
Reproducibility remains a critical challenge in foundation model training for histopathology, often hindered by software randomness, hardware non-determinism, and inconsistent hyperparameter reporting. To investigate these issues, we trained a CLIP model on the QUILT-1M dataset and systematically evaluated the impact of different hyperparameter settings and augmentation strategies across three downstream histopathology datasets (PatchCamelyon, LC25000-Lung, and LC25000-Colon). Despite variability across runs, we identified clear trends: RandomResizedCrop values of 0.7-0.8 outperformed more aggressive (0.6) or conservative (0.9) settings, distributed training without local loss improved stability, and learning rates below 5.0e-5 consistently degraded performance across all datasets. The LC25000 (Colon) dataset consistently provided the most reproducible benchmark. These findings highlight that reproducibility in computational pathology depends not only on transparent documentation but also on carefully chosen experimental configurations, and we provide practical rules to guide future efforts in developing reproducible foundation models for digital pathology.