🤖 AI Summary
Existing foundation models in computational pathology produce fragmented patch-level representations that struggle to support diagnostic tasks requiring unified whole-slide inference and clinically meaningful semantic associations. To address this, this work proposes a weakly supervised framework that, for the first time, integrates a sparse mixture-of-experts mechanism, multimodal masked reconstruction, and contrastive learning to construct a unified whole-slide semantic space using only slide-level metadata—such as cancer type and anatomical site—and achieves semantic alignment guided by structured pathological annotations. The method attains macro-AUC scores of 97.8%–99.7% across 16 cancer types and achieves Dice coefficients of 0.897 and 0.738 for tumor localization on internal and external TCGA test sets, respectively, substantially advancing pan-cancer classification and text-guided localization performance.
📝 Abstract
The expanding ecosystem of pathology foundation models has produced powerful but fragmented tile-level representations, limiting their use in clinical tasks that require unified slide-level reasoning and interpretable linkage to clinically meaningful information. We present ASTRA, a pan-cancer framework that integrates heterogeneous foundation-model representations into a shared slide-level representation space and semantically grounds that space using structured pathology annotation fields, including classification category, cancer type, and anatomic site. ASTRA combines sparse mixture-of-experts contextualization, masked multi-model reconstruction, and contrastive alignment to structured pathology prompts to learn slide representations that support 4-category classification, 3-class solid tumor typing, 16-class cancer typing, and text-guided tumor localization without pixel-level supervision. Developed on a CHTN cohort of 10,359 whole-slide images (WSIs) spanning 16 tumor types, ASTRA consistently improves pan-cancer classification across four pathology foundation-model backbones, achieving up to 97.8% macro-AUC for 4-category classification, 99.7% for 3-class solid tumor typing, and 99.2% for 16-class cancer typing. For tumor localization, ASTRA achieves a mean Dice of 0.897 on an annotated in-domain CHTN subset (n = 380) spanning 16 cancer types and 0.738 on an external TCGA cohort (n = 1,686) spanning four cancer types. These results demonstrate that minimal structured pathology annotation fields derived from slide-level metadata can provide effective semantic supervision for unified slide representation learning, enabling both pan-cancer prediction and weakly supervised tumor localization within a single framework.