🤖 AI Summary
This study systematically investigates how tissue detection quality impacts downstream AI diagnostic performance in digital pathology. Focusing on prostate cancer whole-slide images (WSIs), it compares conventional thresholding with a UNet++-based AI-driven tissue segmentation method, evaluating their effects on Gleason grading AI models across multicenter, multi-platform datasets. Results quantitatively demonstrate that tissue detection errors induce clinically significant grading discrepancies in 3.5% of malignant slides; AI-based tissue detection reduces complete tissue miss-rate from 0.43% to 0.08%. Although overall Gleason grading accuracy remains unchanged, critical misclassifications—particularly between prognostically distinct patterns (e.g., Gleason 3+4 vs. 4+3)—are markedly reduced. This work establishes tissue detection as a critical pre-processing bottleneck for clinical-grade pathology AI and introduces a reproducible, quantitative framework for assessing its impact on diagnostic reliability.
📝 Abstract
Tissue detection is a crucial first step in most digital pathology applications. Details of the segmentation algorithm are rarely reported, and there is a lack of studies investigating the downstream effects of a poor segmentation algorithm. Disregarding tissue detection quality could create a bottleneck for downstream performance and jeopardize patient safety if diagnostically relevant parts of the specimen are excluded from analysis in clinical applications. This study aims to determine whether performance of downstream tasks is sensitive to the tissue detection method, and to compare performance of classical and AI-based tissue detection. To this end, we trained an AI model for Gleason grading of prostate cancer in whole slide images (WSIs) using two different tissue detection algorithms: thresholding (classical) and UNet++ (AI). A total of 33,823 WSIs scanned on five digital pathology scanners were used to train the tissue detection AI model. The downstream Gleason grading algorithm was trained and tested using 70,524 WSIs from 13 clinical sites scanned on 13 different scanners. There was a decrease from 116 (0.43%) to 22 (0.08%) fully undetected tissue samples when switching from thresholding-based tissue detection to AI-based, suggesting an AI model may be more reliable than a classical model for avoiding total failures on slides with unusual appearance. On the slides where tissue could be detected by both algorithms, no significant difference in overall Gleason grading performance was observed. However, tissue detection dependent clinically significant variations in AI grading were observed in 3.5% of malignant slides, highlighting the importance of robust tissue detection for optimal clinical performance of diagnostic AI.