A Multicentric Dataset for Training and Benchmarking Breast Cancer Segmentation in H&E Slides

📅 2025-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing public breast cancer histopathology datasets exhibit insufficient morphological diversity, limiting model generalizability across heterogeneous patient populations and hindering biomarker validation. To address this, we introduce BEETLE—the first multi-center, H&E-stained whole-slide image segmentation dataset specifically designed to capture rare morphologies, including diffuse lobular carcinoma infiltration and ductal carcinoma in situ. BEETLE comprises 587 annotated cases spanning diverse molecular subtypes and histological grades, enabling semantic segmentation of invasive/non-invasive epithelium, necrosis, and other clinically relevant structures. Images were acquired using seven distinct digital slide scanners; annotations underwent multiple rounds of expert review and are accompanied by standardized clinical metadata. An independent external test set is provided for robust evaluation. Experiments demonstrate that BEETLE substantially improves cross-institutional model generalization. As an open-source resource, BEETLE enables reproducible, benchmarkable, and quantitatively rigorous automated analysis of breast cancer histopathology.

Technology Category

Application Category

📝 Abstract
Automated semantic segmentation of whole-slide images (WSIs) stained with hematoxylin and eosin (H&E) is essential for large-scale artificial intelligence-based biomarker analysis in breast cancer. However, existing public datasets for breast cancer segmentation lack the morphological diversity needed to support model generalizability and robust biomarker validation across heterogeneous patient cohorts. We introduce BrEast cancEr hisTopathoLogy sEgmentation (BEETLE), a dataset for multiclass semantic segmentation of H&E-stained breast cancer WSIs. It consists of 587 biopsies and resections from three collaborating clinical centers and two public datasets, digitized using seven scanners, and covers all molecular subtypes and histological grades. Using diverse annotation strategies, we collected annotations across four classes - invasive epithelium, non-invasive epithelium, necrosis, and other - with particular focus on morphologies underrepresented in existing datasets, such as ductal carcinoma in situ and dispersed lobular tumor cells. The dataset's diversity and relevance to the rapidly growing field of automated biomarker quantification in breast cancer ensure its high potential for reuse. Finally, we provide a well-curated, multicentric external evaluation set to enable standardized benchmarking of breast cancer segmentation models.
Problem

Research questions and friction points this paper is trying to address.

Addressing limited morphological diversity in breast cancer segmentation datasets
Providing multiclass annotations for underrepresented breast cancer morphologies
Establishing standardized benchmarks for breast cancer segmentation model evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multicentric dataset with diverse breast cancer morphologies
Combines biopsies from multiple centers and public datasets
Provides standardized evaluation set for segmentation benchmarking
🔎 Similar Papers
No similar papers found.
C
Carlijn Lems
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
L
Leslie Tessier
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
J
John-Melle Bokhorst
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
M
Mart van Rijthoven
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
Witali Aswolinskiy
Witali Aswolinskiy
Deep learning & Computational Pathology, Paicon GmbH
deep learningneural networks
M
Matteo Pozzi
Fondazione Bruno Kessler, Trento, Italy
N
Natalie Klubickova
Biopticka Laboratory Ltd., Pilsen, Czech Republic
S
Suzanne Dintzis
University of Washington Medical Center, Seattle, Washington, United States
M
Michela Campora
Department of Surgical Pathology, Santa Chiara Hospital, APSS, Trento, Italy
M
Maschenka Balkenhol
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
P
Peter Bult
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
J
Joey Spronck
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
T
Thomas Detone
Department of Surgical Pathology, Santa Chiara Hospital, APSS, Trento, Italy
Mattia Barbareschi
Mattia Barbareschi
S. Chiara Hospital, Trento, Italy
Surgical pathologyMolecular biology
E
Enrico Munari
Pathology Unit, University and Hospital Trust of Verona, Verona, Italy
G
Giuseppe Bogina
IRCCS Sacro Cuore Don Calabria Hospital, Negrar di Valpolicella, Verona, Italy
J
Jelle Wesseling
Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands
E
Esther H. Lips
Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands
Francesco Ciompi
Francesco Ciompi
Radboud University Medical Center, Nijmegen
Deep LearningComputational PathologyMedical Image AnalysisComputer Aided Diagnosis
Frédérique Meeuwsen
Frédérique Meeuwsen
Pathologist/Researcher at the Computational Pathology Group, Radboudumc, Nijmegen, the Netherlands
Jeroen van der Laak
Jeroen van der Laak
Radboud University Medical Center
Digital PathologyComputational PathologyDeep LearningImage Analysis