🤖 AI Summary
Accurate survival prediction for non-small cell lung cancer (NSCLC) patients undergoing immune checkpoint inhibitor (ICI) therapy remains challenging due to heterogeneous treatment responses and limited predictive biomarkers.
Method: We constructed the first large-scale multimodal cohort integrating 3D CT imaging and structured clinical data, and proposed a cross-modal masked learning framework: (i) a Slice-Depth Transformer to model spatial-depth dependencies in 3D CT volumes; (ii) a graph-structured Transformer to encode relational patterns among clinical variables; and (iii) a dual-branch Transformer for deep cross-modal feature fusion and missing modality reconstruction.
Contribution/Results: Our method significantly outperforms existing unimodal and multimodal baselines in progression-free survival (PFS) and overall survival (OS) prediction (C-index improvement ≥0.04). It achieves the first high-accuracy, interpretable risk stratification specifically for ICI-treated NSCLC patients, establishing a novel benchmark tool for personalized immunotherapy decision-making.
📝 Abstract
Accurate prognosis of non-small cell lung cancer (NSCLC) patients undergoing immunotherapy is essential for personalized treatment planning, enabling informed patient decisions, and improving both treatment outcomes and quality of life. However, the lack of large, relevant datasets and effective multi-modal feature fusion strategies pose significant challenges in this domain. To address these challenges, we present a large-scale dataset and introduce a novel framework for multi-modal feature fusion aimed at enhancing the accuracy of survival prediction. The dataset comprises 3D CT images and corresponding clinical records from NSCLC patients treated with immune checkpoint inhibitors (ICI), along with progression-free survival (PFS) and overall survival (OS) data. We further propose a cross-modality masked learning approach for medical feature fusion, consisting of two distinct branches, each tailored to its respective modality: a Slice-Depth Transformer for extracting 3D features from CT images and a graph-based Transformer for learning node features and relationships among clinical variables in tabular data. The fusion process is guided by a masked modality learning strategy, wherein the model utilizes the intact modality to reconstruct missing components. This mechanism improves the integration of modality-specific features, fostering more effective inter-modality relationships and feature interactions. Our approach demonstrates superior performance in multi-modal integration for NSCLC survival prediction, surpassing existing methods and setting a new benchmark for prognostic models in this context.