Any-to-Any Learning in Computational Pathology via Triplet Multimodal Pretraining

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenges of fusing heterogeneous multimodal data (whole-slide images, genomics, and pathology reports) in computational pathology—namely, difficulty in cross-modal integration, poor robustness under modality missing, and lack of a unified modeling framework for both unimodal and multimodal tasks—this paper proposes the first any-to-any trimodal pretraining framework. Our approach integrates contrastive learning with masked reconstruction pretraining, introduces a modality-aware attention mechanism, and employs a progressive cross-modal alignment loss to enable dynamic modality composition and robust learning under partial modality absence, thereby departing from the conventional WSI-centric paradigm. Evaluated on four downstream tasks—survival prediction, cancer subtype classification, genomic mutation inference, and pathology report generation—the framework achieves state-of-the-art or superior performance across all. Under modality-missing scenarios, it yields an average AUC improvement of 4.2%, significantly enhancing cross-modal representation consistency and generalization capability.

Technology Category

Application Category

📝 Abstract
Recent advances in computational pathology and artificial intelligence have significantly enhanced the utilization of gigapixel whole-slide images and and additional modalities (e.g., genomics) for pathological diagnosis. Although deep learning has demonstrated strong potential in pathology, several key challenges persist: (1) fusing heterogeneous data types requires sophisticated strategies beyond simple concatenation due to high computational costs; (2) common scenarios of missing modalities necessitate flexible strategies that allow the model to learn robustly in the absence of certain modalities; (3) the downstream tasks in CPath are diverse, ranging from unimodal to multimodal, cnecessitating a unified model capable of handling all modalities. To address these challenges, we propose ALTER, an any-to-any tri-modal pretraining framework that integrates WSIs, genomics, and pathology reports. The term"any"emphasizes ALTER's modality-adaptive design, enabling flexible pretraining with any subset of modalities, and its capacity to learn robust, cross-modal representations beyond WSI-centric approaches. We evaluate ALTER across extensive clinical tasks including survival prediction, cancer subtyping, gene mutation prediction, and report generation, achieving superior or comparable performance to state-of-the-art baselines.
Problem

Research questions and friction points this paper is trying to address.

Fusing heterogeneous data types efficiently in computational pathology
Handling missing modalities robustly in diagnostic models
Creating a unified model for diverse downstream tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Triplet multimodal pretraining integrates WSIs, genomics, reports
Modality-adaptive design handles any subset of modalities
Robust cross-modal representations for diverse clinical tasks
🔎 Similar Papers
Q
Qichen Sun
Peking University
Z
Zhengrui Guo
The Hong Kong University of Science and Technology
R
Rui Peng
Peking University
H
Hao Chen
The Hong Kong University of Science and Technology
Jinzhuo Wang
Jinzhuo Wang
Peking University