TICON: A Slide-Level Tile Contextualizer for Histopathology Representation Learning

📅 2025-12-24

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

To address the limitation in whole-slide image (WSI) analysis where tile representations lack global context and struggle to jointly optimize local- and slide-level tasks, this paper proposes TICON—a Transformer-based universal tile contextualizer. Its core innovation is the first slide-level masked tile modeling pretraining paradigm, which achieves agnostic, unified enhancement of outputs from any tile encoder via cross-model embedding alignment and a lightweight slide-level aggregator. Pretrained on only 11K WSIs, TICON surpasses state-of-the-art (SOTA) slide aggregators trained on 350K samples. It establishes new SOTA performance across HEST-Bench, THUNDER, and CATCH (tile-level benchmarks) and Patho-Bench (slide-level benchmark), significantly improving downstream tasks including classification, segmentation, and survival prediction.

Technology Category

Application Category

📝 Abstract

The interpretation of small tiles in large whole slide images (WSI) often needs a larger image context. We introduce TICON, a transformer-based tile representation contextualizer that produces rich, contextualized embeddings for ''any'' application in computational pathology. Standard tile encoder-based pipelines, which extract embeddings of tiles stripped from their context, fail to model the rich slide-level information essential for both local and global tasks. Furthermore, different tile-encoders excel at different downstream tasks. Therefore, a unified model is needed to contextualize embeddings derived from ''any'' tile-level foundation model. TICON addresses this need with a single, shared encoder, pretrained using a masked modeling objective to simultaneously unify and contextualize representations from diverse tile-level pathology foundation models. Our experiments demonstrate that TICON-contextualized embeddings significantly improve performance across many different tasks, establishing new state-of-the-art results on tile-level benchmarks (i.e., HEST-Bench, THUNDER, CATCH) and slide-level benchmarks (i.e., Patho-Bench). Finally, we pretrain an aggregator on TICON to form a slide-level foundation model, using only 11K WSIs, outperforming SoTA slide-level foundation models pretrained with up to 350K WSIs.

Problem

Research questions and friction points this paper is trying to address.

Contextualizes small tile embeddings in whole slide images

Unifies diverse tile-level foundation models into one encoder

Improves performance across tile-level and slide-level pathology tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based contextualizer for tile embeddings

Masked modeling pretraining unifies diverse foundation models

Slide-level foundation model outperforms SoTA with fewer WSIs

🔎 Similar Papers

No similar papers found.