Prostate biopsy whole slide image dataset from an underrepresented Middle Eastern population

📅 2025-12-03

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Publicly available whole-slide imaging (WSI) datasets for histopathology are predominantly derived from Western populations, with severe underrepresentation of regions such as the Middle East—where digital pathology infrastructure remains limited—thereby hindering the cross-population generalizability of AI models. Method: We introduce the first prostate needle biopsy WSI dataset from Erbil, Iraq, comprising 339 WSIs from 185 patients, acquired in native formats using Leica, Hamamatsu, and Grundium scanners. All slides were independently annotated by three board-certified pathologists for Gleason score and ISUP grade group, followed by rigorous de-identification. Contribution/Results: This dataset fills a critical gap in Middle Eastern digital prostate pathology. It enables robust cross-scanner evaluation, color normalization research, and multi-expert inter-observer agreement analysis. Released under the CC BY 4.0 license via BioImage Archive, it significantly enhances reproducibility and validation of AI models across diverse global populations and heterogeneous scanning platforms.

Technology Category

Application Category

📝 Abstract

Artificial intelligence (AI) is increasingly used in digital pathology. Publicly available histopathology datasets remain scarce, and those that do exist predominantly represent Western populations. Consequently, the generalizability of AI models to populations from less digitized regions, such as the Middle East, is largely unknown. This motivates the public release of our dataset to support the development and validation of pathology AI models across globally diverse populations. We present 339 whole-slide images of prostate core needle biopsies from a consecutive series of 185 patients collected in Erbil, Iraq. The slides are associated with Gleason scores and International Society of Urological Pathology grades assigned independently by three pathologists. Scanning was performed using two high-throughput scanners (Leica and Hamamatsu) and one compact scanner (Grundium). All slides were de-identified and are provided in their native formats without further conversion. The dataset enables grading concordance analyses, color normalization, and cross-scanner robustness evaluations. Data will be deposited in the Bioimage Archive (BIA) under accession code: to be announced (TBA), and released under a CC BY 4.0 license.

Problem

Research questions and friction points this paper is trying to address.

Addresses scarcity of non-Western histopathology datasets for AI

Enables validation of AI models on Middle Eastern prostate biopsy images

Supports grading concordance and cross-scanner robustness analyses

Innovation

Methods, ideas, or system contributions that make the work stand out.

Publicly releasing Middle Eastern prostate biopsy dataset

Using multiple scanners for cross-scanner robustness evaluations

Providing native-format slides with independent pathologist grades

🔎 Similar Papers

No similar papers found.