đ€ AI Summary
This study addresses the poor generalizability of prostate pathology AI models to Middle Eastern populations. We conducted the first external validation in Iraqi Kurdistan (n=339 biopsy cases), bridging a critical gap in clinical AI evaluation for underrepresented regions. Digitally scanned slides were acquired using multi-vendor scanners (Hamamatsu, Leica, Grundium), enabling construction of the first publicly available digital pathology dataset tailored to Middle Eastern patients. We systematically evaluated diagnostic accuracy and Gleason grading performance using task-specific end-to-end models, pathology foundation models, and a cross-device standardization pipeline. Results show near-perfect inter-rater agreement between AI and pathologists for Gleason grading (Îș=0.801), statistically indistinguishable from inter-pathologist agreement (Îș=0.799; p=0.982). Cross-scanner consistency exceeded Îș>0.90, demonstrating that low-cost, compact scanners (e.g., Grundium) can support pathology-grade AI performance even in resource-constrained settings.
đ Abstract
Background: Artificial intelligence (AI) is improving the efficiency and accuracy of cancer diagnostics. The performance of pathology AI systems has been almost exclusively evaluated on European and US cohorts from large centers. For global AI adoption in pathology, validation studies on currently under-represented populations - where the potential gains from AI support may also be greatest - are needed. We present the first study with an external validation cohort from the Middle East, focusing on AI-based diagnosis and Gleason grading of prostate cancer.
Methods: We collected and digitised 339 prostate biopsy specimens from the Kurdistan region, Iraq, representing a consecutive series of 185 patients spanning the period 2013-2024. We evaluated a task-specific end-to-end AI model and two foundation models in terms of their concordance with pathologists and consistency across samples digitised on three scanner models (Hamamatsu, Leica, and Grundium).
Findings: Grading concordance between AI and pathologists was similar to pathologist-pathologist concordance with Cohen's quadratically weighted kappa 0.801 vs. 0.799 (p=0.9824). Cross-scanner concordance was high (quadratically weighted kappa > 0.90) for all AI models and scanner pairs, including low-cost compact scanner.
Interpretation: AI models demonstrated pathologist-level performance in prostate histopathology assessment. Compact scanners can provide a route for validation studies in non-digitalised settings and enable cost-effective adoption of AI in laboratories with limited sample volumes. This first openly available digital pathology dataset from the Middle East supports further research into globally equitable AI pathology.
Funding: SciLifeLab and Wallenberg Data Driven Life Science Program, Instrumentarium Science Foundation, Karolinska Institutet Research Foundation.