🤖 AI Summary
In whole-slide image (WSI)-based survival analysis, challenges include high feature noise, scarcity of labeled data, and underutilization of patient-specific information embedded in pathology reports. To address these, we propose Rasa, a report-assisted self-distillation framework. Rasa leverages large language models to extract fine-grained semantic descriptions from noisy pathology text; employs a text-guided self-distillation mechanism to suppress irrelevant WSI features; and introduces a risk-aware mix-up strategy to enhance data diversity and improve modeling of the underlying risk distribution. Crucially, Rasa enables end-to-end multimodal alignment learning between WSIs and pathology reports. Evaluated on a curated colorectal cancer (CRC) dataset and the public TCGA-BRCA cohort, Rasa significantly outperforms state-of-the-art methods, demonstrating superior performance in cancer prognosis prediction and strong cross-cancer generalizability.
📝 Abstract
Survival analysis based on Whole Slide Images (WSIs) is crucial for evaluating cancer prognosis, as they offer detailed microscopic information essential for predicting patient outcomes. However, traditional WSI-based survival analysis usually faces noisy features and limited data accessibility, hindering their ability to capture critical prognostic features effectively. Although pathology reports provide rich patient-specific information that could assist analysis, their potential to enhance WSI-based survival analysis remains largely unexplored. To this end, this paper proposes a novel Report-auxiliary self-distillation (Rasa) framework for WSI-based survival analysis. First, advanced large language models (LLMs) are utilized to extract fine-grained, WSI-relevant textual descriptions from original noisy pathology reports via a carefully designed task prompt. Next, a self-distillation-based pipeline is designed to filter out irrelevant or redundant WSI features for the student model under the guidance of the teacher model's textual knowledge. Finally, a risk-aware mix-up strategy is incorporated during the training of the student model to enhance both the quantity and diversity of the training data. Extensive experiments carried out on our collected data (CRC) and public data (TCGA-BRCA) demonstrate the superior effectiveness of Rasa against state-of-the-art methods. Our code is available at https://github.com/zhengwang9/Rasa.