CARE: A Molecular-Guided Foundation Model with Adaptive Region Modeling for Whole Slide Image Analysis

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing foundation models in computational pathology often rely on backbone networks pretrained on natural images, which struggle to capture the heterogeneity and non-uniformity of tissue morphology, thereby limiting clinical interpretability and utility. To address this, this work proposes CARE, a novel foundation model that uniquely integrates molecular information—specifically RNA and protein data—into the regional modeling of histopathology images. CARE employs a two-stage self-supervised pretraining strategy: first learning morphological representations from unlabeled whole-slide images, then leveraging molecular data to guide the generation of biologically meaningful and morphologically coherent adaptive tissue regions. Notably, CARE requires no manual segmentation and uses only one-tenth of the pretraining data typical of mainstream models, yet achieves superior average performance across 33 diverse downstream tasks—including morphological classification, molecular prediction, and survival analysis—demonstrating enhanced generalization and clinical relevance.

Technology Category

Application Category

📝 Abstract
Foundation models have recently achieved impressive success in computational pathology, demonstrating strong generalization across diverse histopathology tasks. However, existing models overlook the heterogeneous and non-uniform organization of pathological regions of interest (ROIs) because they rely on natural image backbones not tailored for tissue morphology. Consequently, they often fail to capture the coherent tissue architecture beyond isolated patches, limiting interpretability and clinical relevance. To address these challenges, we present Cross-modal Adaptive Region Encoder (CARE), a foundation model for pathology that automatically partitions WSIs into several morphologically relevant regions. Specifically, CARE employs a two-stage pretraining strategy: (1) a self-supervised unimodal pretraining stage that learns morphological representations from 34,277 whole-slide images (WSIs) without segmentation annotations, and (2) a cross-modal alignment stage that leverages RNA and protein profiles to refine the construction and representation of adaptive regions. This molecular guidance enables CARE to identify biologically relevant patterns and generate irregular yet coherent tissue regions, selecting the most representative area as ROI. CARE supports a broad range of pathology-related tasks, using either the ROI feature or the slide-level feature obtained by aggregating adaptive regions. Based on only one-tenth of the pretraining data typically used by mainstream foundation models, CARE achieves superior average performance across 33 downstream benchmarks, including morphological classification, molecular prediction, and survival analysis, and outperforms other foundation model baselines overall.
Problem

Research questions and friction points this paper is trying to address.

computational pathology
whole slide image
region of interest
tissue morphology
foundation model
Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive region modeling
cross-modal alignment
molecular-guided foundation model
whole slide image analysis
self-supervised pretraining
D
Di Zhang
Xi’an Jiaotong University
Z
Zhangpeng Gong
Xi’an Jiaotong University
X
Xiaobo Pang
Xi’an Jiaotong University
J
Jiashuai Liu
Xi’an Jiaotong University
J
Junbo Lu
Xi’an Jiaotong University
Hao Cui
Hao Cui
University of California, Irvine
privacy policyimage watermarking
J
Jiusong Ge
Xi’an Jiaotong University
Zhi Zeng
Zhi Zeng
Xi'an Jiaotong University
Natural Language ProcessingData MiningMultimodal LearningFake NewsShort Video
Kai Yi
Kai Yi
MRC Laboratory of Molecular Biology
Deep LearningProtein Design
Y
Yinghua Li
KingMed
Si Liu
Si Liu
Fred Hutchinson Cancer Center
GenomicsBiostatisticsAnomaly DetectionOpen Category Detection
T
Tingsong Yu
KingMed
H
Haoran Wang
BGI Research
M
Mireia Crispin-Ortuzar
University of Cambridge
W
Weimiao Yu
A⋆STAR
Chen Li
Chen Li
Xi'an Jiaotong University
Zeyu Gao
Zeyu Gao
University of Cambridge
deep learningmechine learningimage processingmedical imaginghyperspectral imaging