π€ AI Summary
This study addresses a critical gap in existing cancer prognosis methods, which often overlook the proteome as a pivotal intermediary linking genomic alterations to histopathological morphology and fail to model the biological hierarchy inherent in multi-omics data. To bridge this gap, the authors propose HFGPI, a hierarchical fusion framework that explicitly incorporates the proteome as a mediating modality, establishing a cascaded modeling pathway from genes to proteins to pathology images. HFGPI employs a Molecular Tokenizer for molecular encoding, introduces a Gene-Regulated Protein Fusion (GRPF) module to explicitly capture geneβprotein regulatory relationships, and leverages Protein-Guided Hypergraph Learning (PGHL) to model high-order associations between proteins and tissue morphology. Evaluated on five benchmark datasets, HFGPI significantly outperforms current state-of-the-art methods, yielding improved accuracy in cancer survival prediction.
π Abstract
To enhance the precision of cancer prognosis, recent research has increasingly focused on multimodal survival methods by integrating genomic data and histology images. However, current approaches overlook the fact that the proteome serves as an intermediate layer bridging genomic alterations and histopathological features while providing complementary biological information essential for survival prediction. This biological reality exposes another architectural limitation: existing integrative analysis studies fuse these heterogeneous data sources in a flat manner that fails to capture their inherent biological hierarchy. To address these limitations, we propose HFGPI, a hierarchical fusion framework that models the biological progression from genes to proteins to histology images from a systems biology perspective. Specifically, we introduce Molecular Tokenizer, a molecular encoding strategy that integrates identity embeddings with expression profiles to construct biologically informed representations for genes and proteins. We then develop Gene-Regulated Protein Fusion (GRPF), which employs graph-aware cross-attention with structure-preserving alignment to explicitly model gene-protein regulatory relationships and generate gene-regulated protein representations. Additionally, we propose Protein-Guided Hypergraph Learning (PGHL), which establishes associations between proteins and image patches, leveraging hypergraph convolution to capture higher-order protein-morphology relationships. The final features are progressively fused across hierarchical layers to achieve precise survival outcome prediction. Extensive experiments on five benchmark datasets demonstrate the superiority of HFGPI over state-of-the-art methods.