🤖 AI Summary
Existing cell instance segmentation models exhibit insufficient cross-dataset generalization on kidney histopathology images. Method: We introduce the largest benchmark dataset to date for kidney pathology—comprising 2,542 whole-slide images from human and rodent tissues—and conduct the first systematic, cross-species, multi-stain, multi-tissue evaluation of three foundational models: Cellpose, StarDist, and CellViT. We propose a comprehensive assessment framework integrating quantitative instance segmentation metrics, multi-dimensional prediction distribution analysis, cross-dataset generalization testing, and pathology-consistency validation. Results: All three models suffer >23% average F1-score degradation on complex renal tissue compared to ideal conditions; although CellViT achieves the highest performance, a substantial performance gap remains, underscoring the necessity of kidney-specific modeling. This work establishes a new standard for rigorous, clinically grounded evaluation of computational pathology models.
📝 Abstract
Cell nuclei instance segmentation is a crucial task in digital kidney pathology. Traditional automatic segmentation methods often lack generalizability when applied to unseen datasets. Recently, the success of foundation models (FMs) has provided a more generalizable solution, potentially enabling the segmentation of any cell type. In this study, we perform a large-scale evaluation of three widely used state-of-the-art (SOTA) cell nuclei foundation models (Cellpose, StarDist, and CellViT). Specifically, we created a highly diverse evaluation dataset consisting of 2,542 kidney whole slide images (WSIs) collected from both human and rodent sources, encompassing various tissue types, sizes, and staining methods. To our knowledge, this is the largest-scale evaluation of its kind to date. Our quantitative analysis of the prediction distribution reveals a persistent performance gap in kidney pathology. Among the evaluated models, CellViT demonstrated superior performance in segmenting nuclei in kidney pathology. However, none of the foundation models are perfect; a performance gap remains in general nuclei segmentation for kidney pathology.