🤖 AI Summary
Existing foundation models in computational pathology often rely on backbone networks pretrained on natural images, which struggle to capture the heterogeneity and non-uniformity of tissue morphology, thereby limiting clinical interpretability and utility. To address this, this work proposes CARE, a novel foundation model that uniquely integrates molecular information—specifically RNA and protein data—into the regional modeling of histopathology images. CARE employs a two-stage self-supervised pretraining strategy: first learning morphological representations from unlabeled whole-slide images, then leveraging molecular data to guide the generation of biologically meaningful and morphologically coherent adaptive tissue regions. Notably, CARE requires no manual segmentation and uses only one-tenth of the pretraining data typical of mainstream models, yet achieves superior average performance across 33 diverse downstream tasks—including morphological classification, molecular prediction, and survival analysis—demonstrating enhanced generalization and clinical relevance.
📝 Abstract
Foundation models have recently achieved impressive success in computational pathology, demonstrating strong generalization across diverse histopathology tasks. However, existing models overlook the heterogeneous and non-uniform organization of pathological regions of interest (ROIs) because they rely on natural image backbones not tailored for tissue morphology. Consequently, they often fail to capture the coherent tissue architecture beyond isolated patches, limiting interpretability and clinical relevance. To address these challenges, we present Cross-modal Adaptive Region Encoder (CARE), a foundation model for pathology that automatically partitions WSIs into several morphologically relevant regions. Specifically, CARE employs a two-stage pretraining strategy: (1) a self-supervised unimodal pretraining stage that learns morphological representations from 34,277 whole-slide images (WSIs) without segmentation annotations, and (2) a cross-modal alignment stage that leverages RNA and protein profiles to refine the construction and representation of adaptive regions. This molecular guidance enables CARE to identify biologically relevant patterns and generate irregular yet coherent tissue regions, selecting the most representative area as ROI. CARE supports a broad range of pathology-related tasks, using either the ROI feature or the slide-level feature obtained by aggregating adaptive regions. Based on only one-tenth of the pretraining data typically used by mainstream foundation models, CARE achieves superior average performance across 33 downstream benchmarks, including morphological classification, molecular prediction, and survival analysis, and outperforms other foundation model baselines overall.