π€ AI Summary
Existing methods for cross-modal prediction of spatial transcriptomics (ST) gene expression from histology images suffer from inadequate spot-level alignment and insufficient hierarchical structure modeling, while failing to address the inherent information asymmetry between visual image representations and molecular gene representations. To overcome these limitations, we propose the first hyperbolic-space-based multi-level imageβgene representation learning framework. Our approach introduces a hierarchical hyperspherical alignment mechanism to jointly model the hierarchical semantics of tissue microenvironments (at both spot- and niche-level) and gene expression profiles. By integrating hyperbolic neural networks with hierarchical contrastive learning, we achieve nonlinear cross-modal alignment. Evaluated on four publicly available ST datasets across diverse tissue types, our method achieves state-of-the-art performance, significantly improving gene expression prediction accuracy. This work establishes a novel paradigm for low-cost, high-throughput spatial transcriptomics analysis.
π Abstract
Spatial Transcriptomics (ST) merges the benefits of pathology images and gene expression, linking molecular profiles with tissue structure to analyze spot-level function comprehensively. Predicting gene expression from histology images is a cost-effective alternative to expensive ST technologies. However, existing methods mainly focus on spot-level image-to-gene matching but fail to leverage the full hierarchical structure of ST data, especially on the gene expression side, leading to incomplete image-gene alignment. Moreover, a challenge arises from the inherent information asymmetry: gene expression profiles contain more molecular details that may lack salient visual correlates in histological images, demanding a sophisticated representation learning approach to bridge this modality gap. We propose HyperST, a framework for ST prediction that learns multi-level image-gene representations by modeling the data's inherent hierarchy within hyperbolic space, a natural geometric setting for such structures. First, we design a Multi-Level Representation Extractors to capture both spot-level and niche-level representations from each modality, providing context-aware information beyond individual spot-level image-gene pairs. Second, a Hierarchical Hyperbolic Alignment module is introduced to unify these representations, performing spatial alignment while hierarchically structuring image and gene embeddings. This alignment strategy enriches the image representations with molecular semantics, significantly improving cross-modal prediction. HyperST achieves state-of-the-art performance on four public datasets from different tissues, paving the way for more scalable and accurate spatial transcriptomics prediction.