Dynamic Residual Encoding with Slide-Level Contrastive Learning for End-to-End Whole Slide Image Representation

๐Ÿ“… 2025-10-27
๐Ÿ›๏ธ Proceedings of the 33rd ACM International Conference on Multimedia
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Whole-slide image (WSI) end-to-end representation learning faces severe memory and computational bottlenecks due to gigapixel-scale data. To address this, we propose a dynamic residual encoding framework coupled with slide-level contrastive learning. Our method introduces a tile encoder and a residual feature fusion module, augmented by a memory bank and sliding-window samplingโ€”enabling, for the first time, end-to-end contrastive learning directly on full WSIs while effectively capturing global contextual dependencies. Unlike conventional random tile sampling, our approach preserves spatially coherent information without sacrificing training efficiency, thereby enhancing representation discriminability. We evaluate on three core computational pathology tasks: cancer subtype classification, malignancy detection, and genomic mutation prediction. Our method consistently outperforms state-of-the-art approaches, demonstrating superior effectiveness and strong generalization across diverse histopathological benchmarks.

Technology Category

Application Category

๐Ÿ“ Abstract
Whole Slide Image (WSI) representation is critical for cancer subtyping, cancer recognition and mutation prediction.Training an end-to-end WSI representation model poses significant challenges, as a standard gigapixel slide can contain tens of thousands of image tiles, making it difficult to compute gradients of all tiles in a single mini-batch due to current GPU limitations. To address this challenge, we propose a method of dynamic residual encoding with slide-level contrastive learning (DRE-SLCL) for end-to-end WSI representation. Our approach utilizes a memory bank to store the features of tiles across all WSIs in the dataset. During training, a mini-batch usually contains multiple WSIs. For each WSI in the batch, a subset of tiles is randomly sampled and their features are computed using a tile encoder. Then, additional tile features from the same WSI are selected from the memory bank. The representation of each individual WSI is generated using a residual encoding technique that incorporates both the sampled features and those retrieved from the memory bank. Finally, the slide-level contrastive loss is computed based on the representations and histopathology reports ofthe WSIs within the mini-batch. Experiments conducted over cancer subtyping, cancer recognition, and mutation prediction tasks proved the effectiveness of the proposed DRE-SLCL method.
Problem

Research questions and friction points this paper is trying to address.

Addresses computational challenges in gigapixel whole slide image representation
Enables end-to-end training despite GPU memory constraints
Improves cancer subtyping, recognition and mutation prediction accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic residual encoding for WSI representation
Memory bank stores tile features across slides
Slide-level contrastive learning with pathology reports
๐Ÿ”Ž Similar Papers
No similar papers found.
J
Jing Jin
School of Computer Science and Engineering, Central South University, Changsha, China
X
Xu Liu
School of Computer Science and Engineering, Central South University, Changsha, China
T
Te Gao
School of Computer Science and Engineering, Central South University, Changsha, China
Z
Zhihong Shi
School of Computer Science and Engineering, Central South University, Changsha, China
Yixiong Liang
Yixiong Liang
School of Computer Science and Engineering, Central South University, Changsha, China
Ruiqing Zheng
Ruiqing Zheng
School of Computer Science and Engineering, Central South University, Changsha, China
Hulin Kuang
Hulin Kuang
Central South University
medical image processingintelligent transportation systemsdeep learningmachine learning
Min Zeng
Min Zeng
School of Computer Science and Engineering, Central South University
BioinformaticsMachine LearningDeep Learning
Shichao Kan
Shichao Kan
Central South University
Large Vision Language ModelDeep Metric LearningImage RetrievalObject Retrieval