Towards Spatial Transcriptomics-driven Pathology Foundation Models

📅 2026-02-15

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the challenge of effectively integrating spatial transcriptomics with histopathological images to enhance the representational capacity of foundation models in computational pathology. The authors propose SEAL, a novel framework that, for the first time, enables spatial transcriptomics–guided self-supervised and parameter-efficient fine-tuning, seamlessly injecting localized gene expression signals into existing vision encoders. SEAL supports plug-and-play deployment, cross-organ generalization, and cross-modal retrieval between genes and image regions. Leveraging over 700,000 paired spatial transcriptomic and tissue region samples, SEAL substantially outperforms both pure vision-based and spatial transcriptomics–only baselines across 38 slide-level and 15 region-level tasks, achieving breakthrough improvements in predicting molecular states, pathway activities, and therapeutic responses.

Technology Category

Application Category

📝 Abstract

Spatial transcriptomics (ST) provides spatially resolved measurements of gene expression, enabling characterization of the molecular landscape of human tissue beyond histological assessment as well as localized readouts that can be aligned with morphology. Concurrently, the success of multimodal foundation models that integrate vision with complementary modalities suggests that morphomolecular coupling between local expression and morphology can be systematically used to improve histological representations themselves. We introduce Spatial Expression-Aligned Learning (SEAL), a vision-omics self-supervised learning framework that infuses localized molecular information into pathology vision encoders. Rather than training new encoders from scratch, SEAL is designed as a parameter-efficient vision-omics finetuning method that can be flexibly applied to widely used pathology foundation models. We instantiate SEAL by training on over 700,000 paired gene expression spot-tissue region examples spanning tumor and normal samples from 14 organs. Tested across 38 slide-level and 15 patch-level downstream tasks, SEAL provides a drop-in replacement for pathology foundation models that consistently improves performance over widely used vision-only and ST prediction baselines on slide-level molecular status, pathway activity, and treatment response prediction, as well as patch-level gene expression prediction tasks. Additionally, SEAL encoders exhibit robust domain generalization on out-of-distribution evaluations and enable new cross-modal capabilities such as gene-to-image retrieval. Our work proposes a general framework for ST-guided finetuning of pathology foundation models, showing that augmenting existing models with localized molecular supervision is an effective and practical step for improving visual representations and expanding their cross-modal utility.

Problem

Research questions and friction points this paper is trying to address.

Spatial Transcriptomics

Pathology Foundation Models

Morphomolecular Coupling

Self-supervised Learning

Cross-modal Representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

spatial transcriptomics

foundation models

self-supervised learning