🤖 AI Summary
Existing computational pathology datasets lack organ-specific, fine-grained histological tissue type (HTT) annotations, hindering mechanistic studies of colonic diseases. To address this, we introduce ADPv2—the first high-quality, gastrointestinal-focused dataset comprising 20,004 whole-slide image patches from healthy colon biopsies, expertly annotated according to a three-level hierarchical taxonomy of 32 HTT classes. We propose a two-stage multi-label representation learning framework integrating the VMamba architecture, achieving 0.88 mean average precision (mAP) on multi-label classification. ADPv2 enables colon-specific modeling and pathological pathway analysis for the first time, uncovering statistically significant dual-path progression patterns in colorectal cancer. This establishes a novel paradigm for biomarker discovery and disease mechanism elucidation. The dataset is publicly released.
📝 Abstract
Computational pathology (CoPath) leverages histopathology images to enhance diagnostic precision and reproducibility in clinical pathology. However, publicly available datasets for CoPath that are annotated with extensive histological tissue type (HTT) taxonomies at a granular level remain scarce due to the significant expertise and high annotation costs required. Existing datasets, such as the Atlas of Digital Pathology (ADP), address this by offering diverse HTT annotations generalized to multiple organs, but limit the capability for in-depth studies on specific organ diseases. Building upon this foundation, we introduce ADPv2, a novel dataset focused on gastrointestinal histopathology. Our dataset comprises 20,004 image patches derived from healthy colon biopsy slides, annotated according to a hierarchical taxonomy of 32 distinct HTTs of 3 levels. Furthermore, we train a multilabel representation learning model following a two-stage training procedure on our ADPv2 dataset. We leverage the VMamba architecture and achieving a mean average precision (mAP) of 0.88 in multilabel classification of colon HTTs. Finally, we show that our dataset is capable of an organ-specific in-depth study for potential biomarker discovery by analyzing the model's prediction behavior on tissues affected by different colon diseases, which reveals statistical patterns that confirm the two pathological pathways of colon cancer development. Our dataset is publicly available here: Part 1 at https://zenodo.org/records/15307021, Part 2 at https://zenodo.org/records/15312384 and Part 3 at https://zenodo.org/records/15312792