🤖 AI Summary
This study addresses the challenge of deciphering coordinated molecular-phenotypic mechanisms among cells in tumor tissues. We propose CellSymphony, a multimodal integration framework that pioneers the application of foundation model embeddings for joint representation learning of Xenium spatial transcriptomics and H&E histology images, enabling subcellular-resolution cross-modal alignment. By integrating single-cell-level image feature extraction, spatial gene expression modeling, and multimodal fusion, CellSymphony achieves high-accuracy cell-type annotation and identification of functional niches within the tumor microenvironment (TME). Validated across breast, colorectal, and lung carcinomas, CellSymphony significantly improves functional cell annotation accuracy and systematically uncovers spatially restricted, molecularly distinct TME functional domains. It establishes a scalable, integrative paradigm for elucidating cell phenotype–molecular coupling mechanisms in tissue ecosystems.
📝 Abstract
Xenium, a new spatial transcriptomics platform, enables subcellular-resolution profiling of complex tumor tissues. Despite the rich morphological information in histology images, extracting robust cell-level features and integrating them with spatial transcriptomics data remains a critical challenge. We introduce CellSymphony, a flexible multimodal framework that leverages foundation model-derived embeddings from both Xenium transcriptomic profiles and histology images at true single-cell resolution. By learning joint representations that fuse spatial gene expression with morphological context, CellSymphony achieves accurate cell type annotation and uncovers distinct microenvironmental niches across three cancer types. This work highlights the potential of foundation models and multimodal fusion for deciphering the physiological and phenotypic orchestration of cells within complex tissue ecosystems.