🤖 AI Summary
Publicly available whole-slide imaging (WSI) datasets for histopathology are predominantly derived from Western populations, with severe underrepresentation of regions such as the Middle East—where digital pathology infrastructure remains limited—thereby hindering the cross-population generalizability of AI models. Method: We introduce the first prostate needle biopsy WSI dataset from Erbil, Iraq, comprising 339 WSIs from 185 patients, acquired in native formats using Leica, Hamamatsu, and Grundium scanners. All slides were independently annotated by three board-certified pathologists for Gleason score and ISUP grade group, followed by rigorous de-identification. Contribution/Results: This dataset fills a critical gap in Middle Eastern digital prostate pathology. It enables robust cross-scanner evaluation, color normalization research, and multi-expert inter-observer agreement analysis. Released under the CC BY 4.0 license via BioImage Archive, it significantly enhances reproducibility and validation of AI models across diverse global populations and heterogeneous scanning platforms.
📝 Abstract
Artificial intelligence (AI) is increasingly used in digital pathology. Publicly available histopathology datasets remain scarce, and those that do exist predominantly represent Western populations. Consequently, the generalizability of AI models to populations from less digitized regions, such as the Middle East, is largely unknown. This motivates the public release of our dataset to support the development and validation of pathology AI models across globally diverse populations. We present 339 whole-slide images of prostate core needle biopsies from a consecutive series of 185 patients collected in Erbil, Iraq. The slides are associated with Gleason scores and International Society of Urological Pathology grades assigned independently by three pathologists. Scanning was performed using two high-throughput scanners (Leica and Hamamatsu) and one compact scanner (Grundium). All slides were de-identified and are provided in their native formats without further conversion. The dataset enables grading concordance analyses, color normalization, and cross-scanner robustness evaluations. Data will be deposited in the Bioimage Archive (BIA) under accession code: to be announced (TBA), and released under a CC BY 4.0 license.