Validation of Diagnostic Artificial Intelligence Models for Prostate Pathology in a Middle Eastern Cohort

📅 2025-12-19

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This study addresses the poor generalizability of prostate pathology AI models to Middle Eastern populations. We conducted the first external validation in Iraqi Kurdistan (n=339 biopsy cases), bridging a critical gap in clinical AI evaluation for underrepresented regions. Digitally scanned slides were acquired using multi-vendor scanners (Hamamatsu, Leica, Grundium), enabling construction of the first publicly available digital pathology dataset tailored to Middle Eastern patients. We systematically evaluated diagnostic accuracy and Gleason grading performance using task-specific end-to-end models, pathology foundation models, and a cross-device standardization pipeline. Results show near-perfect inter-rater agreement between AI and pathologists for Gleason grading (κ=0.801), statistically indistinguishable from inter-pathologist agreement (κ=0.799; p=0.982). Cross-scanner consistency exceeded κ>0.90, demonstrating that low-cost, compact scanners (e.g., Grundium) can support pathology-grade AI performance even in resource-constrained settings.

Technology Category

Application Category

📝 Abstract

Background: Artificial intelligence (AI) is improving the efficiency and accuracy of cancer diagnostics. The performance of pathology AI systems has been almost exclusively evaluated on European and US cohorts from large centers. For global AI adoption in pathology, validation studies on currently under-represented populations - where the potential gains from AI support may also be greatest - are needed. We present the first study with an external validation cohort from the Middle East, focusing on AI-based diagnosis and Gleason grading of prostate cancer. Methods: We collected and digitised 339 prostate biopsy specimens from the Kurdistan region, Iraq, representing a consecutive series of 185 patients spanning the period 2013-2024. We evaluated a task-specific end-to-end AI model and two foundation models in terms of their concordance with pathologists and consistency across samples digitised on three scanner models (Hamamatsu, Leica, and Grundium). Findings: Grading concordance between AI and pathologists was similar to pathologist-pathologist concordance with Cohen's quadratically weighted kappa 0.801 vs. 0.799 (p=0.9824). Cross-scanner concordance was high (quadratically weighted kappa > 0.90) for all AI models and scanner pairs, including low-cost compact scanner. Interpretation: AI models demonstrated pathologist-level performance in prostate histopathology assessment. Compact scanners can provide a route for validation studies in non-digitalised settings and enable cost-effective adoption of AI in laboratories with limited sample volumes. This first openly available digital pathology dataset from the Middle East supports further research into globally equitable AI pathology. Funding: SciLifeLab and Wallenberg Data Driven Life Science Program, Instrumentarium Science Foundation, Karolinska Institutet Research Foundation.

Problem

Research questions and friction points this paper is trying to address.

Validates AI models for prostate cancer diagnosis in Middle Eastern populations

Assesses AI grading concordance with pathologists across multiple scanner types

Evaluates compact scanners for cost-effective AI adoption in low-resource settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Validated AI models on Middle Eastern prostate pathology cohort

Used compact scanners for cost-effective digital pathology adoption

Achieved pathologist-level concordance in cross-scanner AI evaluations

🔎 Similar Papers

No similar papers found.