🤖 AI Summary
Real-world multimodal survival analysis frequently encounters missing modality issues; existing approaches neglect inter-modal distributional discrepancies, leading to inconsistent reconstructions and poor generalization. To address this, we propose a low-rank Transformer–normalizing flow fusion framework. First, we design a class-conditional normalizing flow module to align cross-modal distributions and construct a distributionally consistent latent space. Second, we incorporate a low-rank Transformer to model intra-modal long-range dependencies, enhancing stability in high-dimensional multimodal fusion. This is the first work to integrate normalizing flows with low-rank Transformers for incomplete multimodal survival analysis, effectively mitigating distribution shift and overfitting. Extensive experiments demonstrate that our method achieves state-of-the-art performance under both complete and incomplete modality settings, significantly improving the robustness and accuracy of survival prediction.
📝 Abstract
In recent years, multimodal medical data-based survival analysis has attracted much attention. However, real-world datasets often suffer from the problem of incomplete modality, where some patient modality information is missing due to acquisition limitations or system failures. Existing methods typically infer missing modalities directly from observed ones using deep neural networks, but they often ignore the distributional discrepancy across modalities, resulting in inconsistent and unreliable modality reconstruction. To address these challenges, we propose a novel framework that combines a low-rank Transformer with a flow-based generative model for robust and flexible multimodal survival prediction. Specifically, we first formulate the concerned problem as incomplete multimodal survival analysis using the multi-instance representation of whole slide images (WSIs) and genomic profiles. To realize incomplete multimodal survival analysis, we propose a class-specific flow for cross-modal distribution alignment. Under the condition of class labels, we model and transform the cross-modal distribution. By virtue of the reversible structure and accurate density modeling capabilities of the normalizing flow model, the model can effectively construct a distribution-consistent latent space of the missing modality, thereby improving the consistency between the reconstructed data and the true distribution. Finally, we design a lightweight Transformer architecture to model intra-modal dependencies while alleviating the overfitting problem in high-dimensional modality fusion by virtue of the low-rank Transformer. Extensive experiments have demonstrated that our method not only achieves state-of-the-art performance under complete modality settings, but also maintains robust and superior accuracy under the incomplete modalities scenario.