MMSF: Multitask and Multimodal Supervised Framework for WSI Classification and Survival Analysis

📅 2026-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a multitask, multimodal supervised framework to address the challenge of integrating heterogeneous data such as whole-slide images and clinical records. Built upon a linear-complexity multiple instance learning (MIL) backbone, the method leverages graph neural networks to extract histopathological features, standardizes clinical data into unified embeddings, and explicitly decomposes shared and modality-specific representations to enable effective cross-modal alignment and fusion. Notably, it introduces the Mamba architecture into multimodal pathological analysis for the first time, constructing an efficient Mamba-based MIL encoder. Evaluated on CAMELYON16 and TCGA-NSCLC, the approach improves classification accuracy by 2.1–6.6% and AUC by 2.2–6.9%. Across five TCGA survival cohorts, it achieves significantly higher concordance indices (C-index), outperforming unimodal and other multimodal methods by 7.1–9.8% and 5.6–7.1%, respectively.

Technology Category

Application Category

📝 Abstract
Multimodal evidence is critical in computational pathology: gigapixel whole slide images capture tumor morphology, while patient-level clinical descriptors preserve complementary context for prognosis. Integrating such heterogeneous signals remains challenging because feature spaces exhibit distinct statistics and scales. We introduce MMSF, a multitask and multimodal supervised framework built on a linear-complexity MIL backbone that explicitly decomposes and fuses cross-modal information. MMSF comprises a graph feature extraction module embedding tissue topology at the patch level, a clinical data embedding module standardizing patient attributes, a feature fusion module aligning modality-shared and modality-specific representations, and a Mamba-based MIL encoder with multitask prediction heads. Experiments on CAMELYON16 and TCGA-NSCLC demonstrate 2.1--6.6\% accuracy and 2.2--6.9\% AUC improvements over competitive baselines, while evaluations on five TCGA survival cohorts yield 7.1--9.8\% C-index improvements compared with unimodal methods and 5.6--7.1\% over multimodal alternatives.
Problem

Research questions and friction points this paper is trying to address.

multimodal integration
whole slide image classification
survival analysis
computational pathology
heterogeneous data fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal fusion
Mamba-based MIL
graph feature extraction
multitask learning
computational pathology
C
Chengying She
C
Chengwei Chen
Xinran Zhang
Xinran Zhang
University of Science and Technology of China
SLAMNeRF3DGS
Ben Wang
Ben Wang
University of Oklahoma
L
Lizhuang Liu
C
Chengwei Shao
Y
Yun Bian