A tissue and cell-level annotated H&E and PD-L1 histopathology image dataset in non-small cell lung cancer

📅 2025-07-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing NSCLC digital pathology datasets suffer from narrow cohort coverage, absence of metastatic lesion annotations, and lack of molecular biomarker data (e.g., PD-L1 expression). To address these limitations, we introduce the first open-source, multicenter, multi-stain (H&E and PD-L1 immunohistochemistry) whole-slide image dataset for NSCLC, comprising 887 fully annotated regions of interest from 155 patients. Annotations span three hierarchical levels: tissue compartments (16 classes), individual nuclei, and PD-L1-positive tumor cells. Notably, this is the first publicly available resource featuring manually annotated H&E images of metastatic sites paired with corresponding PD-L1 expression data. Images were acquired across diverse digital slide scanners to ensure technical generalizability. This dataset enables robust benchmarking for NSCLC tissue segmentation, nuclear detection, and computational analysis of the tumor immune microenvironment, establishing a critical foundation for AI-driven pathological diagnosis and quantitative biomarker assessment.

Technology Category

Application Category

📝 Abstract
The tumor immune microenvironment (TIME) in non-small cell lung cancer (NSCLC) histopathology contains morphological and molecular characteristics predictive of immunotherapy response. Computational quantification of TIME characteristics, such as cell detection and tissue segmentation, can support biomarker development. However, currently available digital pathology datasets of NSCLC for the development of cell detection or tissue segmentation algorithms are limited in scope, lack annotations of clinically prevalent metastatic sites, and forgo molecular information such as PD-L1 immunohistochemistry (IHC). To fill this gap, we introduce the IGNITE data toolkit, a multi-stain, multi-centric, and multi-scanner dataset of annotated NSCLC whole-slide images. We publicly release 887 fully annotated regions of interest from 155 unique patients across three complementary tasks: (i) multi-class semantic segmentation of tissue compartments in H&E-stained slides, with 16 classes spanning primary and metastatic NSCLC, (ii) nuclei detection, and (iii) PD-L1 positive tumor cell detection in PD-L1 IHC slides. To the best of our knowledge, this is the first public NSCLC dataset with manual annotations of H&E in metastatic sites and PD-L1 IHC.
Problem

Research questions and friction points this paper is trying to address.

Lack of annotated NSCLC histopathology datasets for computational analysis
Missing PD-L1 molecular data in existing digital pathology resources
Limited scope in current datasets for metastatic site annotations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stain annotated NSCLC dataset
Includes H&E and PD-L1 IHC slides
First public dataset with metastatic annotations
🔎 Similar Papers
No similar papers found.
J
Joey Spronck
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
L
Leander van Eekelen
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
Dominique van Midden
Dominique van Midden
Resident Pathology, Radboud University Medical Center
HistopathologyRenal pathologyDermatopathologyTransplant pathologyComputational pathology
J
Joep Bogaerts
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
L
Leslie Tessier
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
V
Valerie Dechering
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
M
Muradije Demirel-Andishmand
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
G
Gabriel Silva de Souza
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
R
Roland Nemeth
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
E
Enrico Munari
Pathology Unit, Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy
G
Giuseppe Bogina
Department of Pathology, Ospedale Sacro Cuore, Negrar, Verona, Italy
I
Ilaria Girolami
Department of Pathology, Provincial Hospital of Bolzano (SABES-ASDAA), Bolzano-Bozen, Italy
A
Albino Eccher
Department of Pathology and Diagnostics, University and Hospital Trust of Verona, Verona, Italy
B
Balazs Acs
Department of Clinical Pathology and Cancer Diagnostics, Karolinska University Hospital, Stockholm, Sweden
C
Ceren Boyaci
Department of Clinical Pathology and Cancer Diagnostics, Karolinska University Hospital, Stockholm, Sweden
N
Natalie Klubickova
Biopticka Laboratory, Ltd, Pilsen, Czech Republic
M
Monika Looijen-Salamon
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
S
Shoko Vos
Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
Francesco Ciompi
Francesco Ciompi
Radboud University Medical Center, Nijmegen
Deep LearningComputational PathologyMedical Image AnalysisComputer Aided Diagnosis