🤖 AI Summary
This paper addresses the challenge of insufficient robustness in multimodal cancer survival prediction due to frequent genomic data missingness in clinical practice. To this end, we propose LD-CVAE—a latent-differentiated conditional variational autoencoder framework. First, we design LD-VAE to jointly learn compact, discriminative representations of whole-slide images (WSIs) and functionally specific genomic embeddings. Second, we introduce an information-bottleneck Transformer (VIB-Trans) that explicitly disentangles genomic features by biological function during conditional generation. The method integrates variational inference, conditional generative modeling, and a product-of-experts (PoE) multimodal fusion mechanism. Extensive evaluation across five cancer cohorts demonstrates that LD-CVAE significantly outperforms state-of-the-art methods under genomic missingness, while achieving new SOTA performance with complete modalities. To our knowledge, it is the first framework enabling both high robustness to missing genomic data and functionally interpretable cross-modal survival prediction.
📝 Abstract
The integrative analysis of histopathological images and genomic data has received increasing attention for survival prediction of human cancers. However, the existing studies always hold the assumption that full modalities are available. As a matter of fact, the cost for collecting genomic data is high, which sometimes makes genomic data unavailable in testing samples. A common way of tackling such incompleteness is to generate the genomic representations from the pathology images. Nevertheless, such strategy still faces the following two challenges: (1) The gigapixel whole slide images (WSIs) are huge and thus hard for representation. (2) It is difficult to generate the genomic embeddings with diverse function categories in a unified generative framework. To address the above challenges, we propose a Conditional Latent Differentiation Variational AutoEncoder (LD-CVAE) for robust multimodal survival prediction, even with missing genomic data. Specifically, a Variational Information Bottleneck Transformer (VIB-Trans) module is proposed to learn compressed pathological representations from the gigapixel WSIs. To generate different functional genomic features, we develop a novel Latent Differentiation Variational AutoEncoder (LD-VAE) to learn the common and specific posteriors for the genomic embeddings with diverse functions. Finally, we use the product-of-experts technique to integrate the genomic common posterior and image posterior for the joint latent distribution estimation in LD-CVAE. We test the effectiveness of our method on five different cancer datasets, and the experimental results demonstrate its superiority in both complete and missing modality scenarios.