ADPv2: A Hierarchical Histological Tissue Type-Annotated Dataset for Potential Biomarker Discovery of Colorectal Disease

📅 2025-07-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing computational pathology datasets lack organ-specific, fine-grained histological tissue type (HTT) annotations, hindering mechanistic studies of colonic diseases. To address this, we introduce ADPv2—the first high-quality, gastrointestinal-focused dataset comprising 20,004 whole-slide image patches from healthy colon biopsies, expertly annotated according to a three-level hierarchical taxonomy of 32 HTT classes. We propose a two-stage multi-label representation learning framework integrating the VMamba architecture, achieving 0.88 mean average precision (mAP) on multi-label classification. ADPv2 enables colon-specific modeling and pathological pathway analysis for the first time, uncovering statistically significant dual-path progression patterns in colorectal cancer. This establishes a novel paradigm for biomarker discovery and disease mechanism elucidation. The dataset is publicly released.

Technology Category

Application Category

📝 Abstract
Computational pathology (CoPath) leverages histopathology images to enhance diagnostic precision and reproducibility in clinical pathology. However, publicly available datasets for CoPath that are annotated with extensive histological tissue type (HTT) taxonomies at a granular level remain scarce due to the significant expertise and high annotation costs required. Existing datasets, such as the Atlas of Digital Pathology (ADP), address this by offering diverse HTT annotations generalized to multiple organs, but limit the capability for in-depth studies on specific organ diseases. Building upon this foundation, we introduce ADPv2, a novel dataset focused on gastrointestinal histopathology. Our dataset comprises 20,004 image patches derived from healthy colon biopsy slides, annotated according to a hierarchical taxonomy of 32 distinct HTTs of 3 levels. Furthermore, we train a multilabel representation learning model following a two-stage training procedure on our ADPv2 dataset. We leverage the VMamba architecture and achieving a mean average precision (mAP) of 0.88 in multilabel classification of colon HTTs. Finally, we show that our dataset is capable of an organ-specific in-depth study for potential biomarker discovery by analyzing the model's prediction behavior on tissues affected by different colon diseases, which reveals statistical patterns that confirm the two pathological pathways of colon cancer development. Our dataset is publicly available here: Part 1 at https://zenodo.org/records/15307021, Part 2 at https://zenodo.org/records/15312384 and Part 3 at https://zenodo.org/records/15312792
Problem

Research questions and friction points this paper is trying to address.

Lack of granular annotated datasets for colorectal disease biomarker discovery
Limited capability of existing datasets for organ-specific in-depth studies
High expertise and cost barriers in histological tissue type annotation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical HTT taxonomy for colorectal disease
Two-stage multilabel representation learning model
VMamba architecture achieving 0.88 mAP
🔎 Similar Papers
No similar papers found.
Zhiyuan Yang
Zhiyuan Yang
Northeastern University
computer visionremote sensing
K
Kai Li
Department of Electrical & Computer Engineering, University of Toronto, 10 King’s College Rd, Toronto, ON M5S 3G8, Canada
S
Sophia Ghamoshi Ramandi
Department of Chemistry & Biology, Toronto Metropolitan University, 350 Victoria St. Toronto, ON M5B 2K3, Canada
P
Patricia Brassard
Department of Medicine, Université de Montréal, Pavillon Roger-Gaudry, 2900 Edouard Montpetit Blvd, Montreal, QC H3T 1J4, Canada
H
Hakim Khellaf
Department of Pathology & Molecular Medicine, Université de Montréal, 2900 Édouard-Montpetit Blvd, Montréal, QC H3T 1J4, Canada
Vincent Quoc-Huy Trinh
Vincent Quoc-Huy Trinh
University of Montreal
Pathology GI Liver Pancreas
J
Jennifer Zhang
Department of Electrical & Computer Engineering, University of Toronto, 10 King’s College Rd, Toronto, ON M5S 3G8, Canada
L
Lina Chen
Department of Laboratory Medicine & Pathobiology, University of Toronto, Simcoe Hall, 1 King’s College Circle, Toronto, ON M5S 3K3, Canada
C
Corwyn Rowsell
Department of Laboratory Medicine & Pathobiology, University of Toronto, Simcoe Hall, 1 King’s College Circle, Toronto, ON M5S 3K3, Canada
S
Sonal Varma
Department of Pathology & Molecular Medicine, Queen’s University, 88 Stuart Street Queen’s University Kingston, ON K7L 3N6 Canada
K
Kostas Plataniotis
Department of Electrical & Computer Engineering, University of Toronto, 10 King’s College Rd, Toronto, ON M5S 3G8, Canada
Mahdi S. Hosseini
Mahdi S. Hosseini
Assistant Professor, Concordia University, Mila Quebec AI Institute, McGill University
Computer VisionDeep LearningComputational Pathology