🤖 AI Summary
This work addresses the challenges of sparse, patient-specific, and unlabeled prognostic event modeling in multimodal cancer survival prediction. We propose SlotSPE, a slot-based factorized representation learning framework that compresses histopathological images and gene expression data into interpretable slot representations, explicitly capturing high-order cross-modal interactions. SlotSPE incorporates biological prior constraints and a structured event decomposition mechanism, and employs slot attention to achieve modality-specific feature compression—enabling robust inference even under gene expression data missingness. Evaluated across 10 TCGA cancer cohorts, SlotSPE significantly outperforms state-of-the-art methods in 8 cohorts, achieving an average C-index improvement of 2.9%. The framework delivers both strong interpretability—via disentangled, biologically grounded slot representations—and robustness to missing modalities, advancing reliable and explainable multimodal survival analysis.
📝 Abstract
The integration of histology images and gene profiles has shown great promise for improving survival prediction in cancer. However, current approaches often struggle to model intra- and inter-modal interactions efficiently and effectively due to the high dimensionality and complexity of the inputs. A major challenge is capturing critical prognostic events that, though few, underlie the complexity of the observed inputs and largely determine patient outcomes. These events, manifested as high-level structural signals such as spatial histologic patterns or pathway co-activations, are typically sparse, patient-specific, and unannotated, making them inherently difficult to uncover. To address this, we propose SlotSPE, a slot-based framework for structural prognostic event modeling. Specifically, inspired by the principle of factorial coding, we compress each patient's multimodal inputs into compact, modality-specific sets of mutually distinctive slots using slot attention. By leveraging these slot representations as encodings for prognostic events, our framework enables both efficient and effective modeling of complex intra- and inter-modal interactions, while also facilitating seamless incorporation of biological priors that enhance prognostic relevance. Extensive experiments on ten cancer benchmarks show that SlotSPE outperforms existing methods in 8 out of 10 cohorts, achieving an overall improvement of 2.9%. It remains robust under missing genomic data and delivers markedly improved interpretability through structured event decomposition.