MMSF: Multitask and Multimodal Supervised Framework for WSI Classification and Survival Analysis

📅 2026-01-28

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This work proposes a multitask, multimodal supervised framework to address the challenge of integrating heterogeneous data such as whole-slide images and clinical records. Built upon a linear-complexity multiple instance learning (MIL) backbone, the method leverages graph neural networks to extract histopathological features, standardizes clinical data into unified embeddings, and explicitly decomposes shared and modality-specific representations to enable effective cross-modal alignment and fusion. Notably, it introduces the Mamba architecture into multimodal pathological analysis for the first time, constructing an efficient Mamba-based MIL encoder. Evaluated on CAMELYON16 and TCGA-NSCLC, the approach improves classification accuracy by 2.1–6.6% and AUC by 2.2–6.9%. Across five TCGA survival cohorts, it achieves significantly higher concordance indices (C-index), outperforming unimodal and other multimodal methods by 7.1–9.8% and 5.6–7.1%, respectively.

Technology Category

Application Category

📝 Abstract

Multimodal evidence is critical in computational pathology: gigapixel whole slide images capture tumor morphology, while patient-level clinical descriptors preserve complementary context for prognosis. Integrating such heterogeneous signals remains challenging because feature spaces exhibit distinct statistics and scales. We introduce MMSF, a multitask and multimodal supervised framework built on a linear-complexity MIL backbone that explicitly decomposes and fuses cross-modal information. MMSF comprises a graph feature extraction module embedding tissue topology at the patch level, a clinical data embedding module standardizing patient attributes, a feature fusion module aligning modality-shared and modality-specific representations, and a Mamba-based MIL encoder with multitask prediction heads. Experiments on CAMELYON16 and TCGA-NSCLC demonstrate 2.1--6.6\% accuracy and 2.2--6.9\% AUC improvements over competitive baselines, while evaluations on five TCGA survival cohorts yield 7.1--9.8\% C-index improvements compared with unimodal methods and 5.6--7.1\% over multimodal alternatives.

Problem

Research questions and friction points this paper is trying to address.

multimodal integration

whole slide image classification

survival analysis

computational pathology

heterogeneous data fusion

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal fusion

Mamba-based MIL

graph feature extraction