π€ AI Summary
This work proposes an end-to-end multimodal framework for high-accuracy, biologically interpretable cancer survival prediction by integrating histopathology images and transcriptomic data. The approach represents pathology images through learnable morphological prototypes and embeds gene expression using a bidirectional graph neural network grounded in Reactome pathway graphs. Cross-modal attention fusion is guided by biological structure to align these representations. By jointly modeling morphological prototypes and pathway-level gene regulatory information, the method achieves holistic interpretability spanning genes, pathways, and tissue spatial morphology. Evaluated on five TCGA cancer cohorts, the model matches or exceeds the performance of existing methods while offering substantially improved interpretability and reduced computational overhead.
π Abstract
We introduce ProtoPathway, an interpretable-by-design multimodal framework for cancer survival prediction that unifies whole slide imaging and transcriptomics through encoders producing biologically grounded representations on both sides of the fusion. On the histopathology side, $K$ learnable morphological prototypes, trained end-to-end with the survival objective, serve as the slide representation itself: patches flow into prototype tokens via soft assignment, compressing variable-length patch sets into fixed task-adaptive tokens. On the genomic side, a bipartite graph neural network encodes gene expression within the Reactome pathway hierarchy, producing pathway embeddings that reflect both constituent genes and their broader biological context through bidirectional message passing over a shared gene--pathway graph. Cross-modal attention then operates over a compact prototype $\times$ pathway matrix in which prototypes query pathways, modeling the biological direction in which molecular programs give rise to tissue morphology. Because both axes carry stable task-learned identity, the attention matrix is itself an interpretability output, yielding native inference-time attribution across the full biological hierarchy, from genes through pathways and prototypes to spatial tissue maps. We evaluate on five TCGA cancer cohorts, demonstrating competitive or superior survival prediction with substantially improved biological interpretability and reduced computational cost, with interpretability claims validated through fold-stratified rank-based population-level analysis. Our source code, model weights, and Reactome pathways, together with a unified codebase reimplementing all multimodal survival baselines under identical preprocessing and evaluation, are available at: https://github.com/AmayaGS/ProtoPathway.