๐ค AI Summary
Multimodal cancer survival prediction faces key challenges including high-dimensional redundancy in whole-slide images (WSIs) and genomic data, difficulty in cross-modal alignment, and weak supervisory signals. Method: We propose a large language model (LLM)-driven knowledge-enhanced paradigmโfirst integrating expert pathology reports and LLM-generated cancer-specific prognostic knowledge into modeling. We design a Knowledge-Enhanced Cross-Modal (KECM) attention module to enable precise focus on survival-relevant features and semantic alignment across modalities. Our framework jointly incorporates knowledge distillation, knowledge-guided cross-attention, and a deep survival analysis network. Contribution/Results: Evaluated on five public benchmark datasets, our method achieves state-of-the-art performance, with statistically significant improvements in C-index and IPCW-Brier score. The source code will be publicly released.
๐ Abstract
Current multimodal survival prediction methods typically rely on pathology images (WSIs) and genomic data, both of which are high-dimensional and redundant, making it difficult to extract discriminative features from them and align different modalities. Moreover, using a simple survival follow-up label is insufficient to supervise such a complex task. To address these challenges, we propose KEMM, an LLM-driven Knowledge-Enhanced Multimodal Model for cancer survival prediction, which integrates expert reports and prognostic background knowledge. 1) Expert reports, provided by pathologists on a case-by-case basis and refined by large language model (LLM), offer succinct and clinically focused diagnostic statements. This information may typically suggest different survival outcomes. 2) Prognostic background knowledge (PBK), generated concisely by LLM, provides valuable prognostic background knowledge on different cancer types, which also enhances survival prediction. To leverage these knowledge, we introduce the knowledge-enhanced cross-modal (KECM) attention module. KECM can effectively guide the network to focus on discriminative and survival-relevant features from highly redundant modalities. Extensive experiments on five datasets demonstrate that KEMM achieves state-of-the-art performance. The code will be released upon acceptance.