π€ AI Summary
This work addresses the limitation of existing multiple instance learning (MIL) approaches in computational pathology, which rely solely on morphological features from H&E-stained images and struggle to model endpoint tasks governed by molecular statesβsuch as survival outcomes, biomarker status, or molecular subtypes. To overcome this, the authors propose MIST, a novel method that introduces a molecular-informed virtual staining mechanism into MIL. By leveraging paired spatial transcriptomics data, MIST constructs cross-modal prototypes that reorganize H&E-derived features along molecularly guided axes, thereby enhancing representation without requiring transcriptomic input during inference. The approach integrates spatial transcriptomics clustering, prototype anchoring, and frozen foundation model mapping. Evaluated across 23 downstream tasks and 8 MIL aggregators, MIST outperforms baselines in 240 out of 256 configurations, achieving an average improvement of 3.5% (including +5.2% in survival prediction, +3.3% in subtype classification, and +2.6% in biomarker prediction).
π Abstract
Multiple instance learning (MIL) is the dominant framework for whole-slide image analysis in computational pathology, typically combining a frozen patch encoder, a projection layer, and a slide-level aggregator. While encoders and aggregators have been extensively studied, the projection layer remains a largely morphology-only bottleneck. This limits endpoints such as biomarker status and survival, which are governed by a molecular state that is not fully captured by H&E morphology. We introduce Molecularly Informed Staining Transform (MIST), a plug-in replacement for the MIL projection layer that uses paired spatial transcriptomics only during training to construct virtual molecular stains. MIST clusters gene expression profiles into cross-modal prototypes, anchors them in the frozen foundation model feature space, and uses them to reorganize H&E patch features along molecularly guided axes. It requires no transcriptomics at inference and can be inserted before standard MIL aggregators. We evaluate MIST across 23 downstream tasks and 8 MIL aggregators. MIST improves 240 of 256 configurations over the standard projection layer, with an average gain of +3.5%, observed consistently across endpoint types: +5.2% on survival prediction, +3.3% on tissue subtyping, and +2.6% on biomarker prediction. Ablations confirm that gene-derived prototypes are the primary source of the gains, while spatial, biological, and pathological analyses show that cross-modal prototype affinities capture spatially coherent molecular programs from H&E alone.