Dynamic Residual Encoding with Slide-Level Contrastive Learning for End-to-End Whole Slide Image Representation

📅 2025-10-27

🏛️ Proceedings of the 33rd ACM International Conference on Multimedia

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

Whole-slide image (WSI) end-to-end representation learning faces severe memory and computational bottlenecks due to gigapixel-scale data. To address this, we propose a dynamic residual encoding framework coupled with slide-level contrastive learning. Our method introduces a tile encoder and a residual feature fusion module, augmented by a memory bank and sliding-window sampling—enabling, for the first time, end-to-end contrastive learning directly on full WSIs while effectively capturing global contextual dependencies. Unlike conventional random tile sampling, our approach preserves spatially coherent information without sacrificing training efficiency, thereby enhancing representation discriminability. We evaluate on three core computational pathology tasks: cancer subtype classification, malignancy detection, and genomic mutation prediction. Our method consistently outperforms state-of-the-art approaches, demonstrating superior effectiveness and strong generalization across diverse histopathological benchmarks.

Technology Category

Application Category

📝 Abstract

Whole Slide Image (WSI) representation is critical for cancer subtyping, cancer recognition and mutation prediction.Training an end-to-end WSI representation model poses significant challenges, as a standard gigapixel slide can contain tens of thousands of image tiles, making it difficult to compute gradients of all tiles in a single mini-batch due to current GPU limitations. To address this challenge, we propose a method of dynamic residual encoding with slide-level contrastive learning (DRE-SLCL) for end-to-end WSI representation. Our approach utilizes a memory bank to store the features of tiles across all WSIs in the dataset. During training, a mini-batch usually contains multiple WSIs. For each WSI in the batch, a subset of tiles is randomly sampled and their features are computed using a tile encoder. Then, additional tile features from the same WSI are selected from the memory bank. The representation of each individual WSI is generated using a residual encoding technique that incorporates both the sampled features and those retrieved from the memory bank. Finally, the slide-level contrastive loss is computed based on the representations and histopathology reports ofthe WSIs within the mini-batch. Experiments conducted over cancer subtyping, cancer recognition, and mutation prediction tasks proved the effectiveness of the proposed DRE-SLCL method.

Problem

Research questions and friction points this paper is trying to address.

Addresses computational challenges in gigapixel whole slide image representation

Enables end-to-end training despite GPU memory constraints

Improves cancer subtyping, recognition and mutation prediction accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic residual encoding for WSI representation

Memory bank stores tile features across slides

Slide-level contrastive learning with pathology reports

🔎 Similar Papers

No similar papers found.