🤖 AI Summary
Current spatial transcriptomics methods map image-based spots to single gene expression vectors, resulting in loss of spatial resolution—due to cellular heterogeneity within spots—and fixed-scale analysis. This work proposes PixNet, the first end-to-end deep learning model that generates dense, pixel-level gene expression maps, breaking the conventional “spot-to-vector” paradigm to enable continuous, high-resolution expression prediction. PixNet jointly leverages image semantic segmentation and spatial-expression disentangled modeling to support region-wise expression inference for spots of arbitrary size and across multiple scales—from subcellular to tissue-level. Evaluated on three benchmark datasets, PixNet significantly outperforms state-of-the-art methods, achieving consistent improvements in cross-scale prediction accuracy.
📝 Abstract
Spatial transcriptomics (ST) measures gene expression at fine-grained spatial resolution, offering insights into tissue molecular landscapes. Previous methods for spatial gene expression prediction usually crop spots of interest from pathology tissue slide images, and learn a model that maps each spot to a single gene expression profile. However, it fundamentally loses spatial resolution of gene expression: 1) each spot often contains multiple cells with distinct gene expression; 2) spots are cropped at fixed resolutions, limiting the ability to predict gene expression at varying spatial scales. To address these limitations, this paper presents PixNet, a dense prediction network capable of predicting spatially resolved gene expression across spots of varying sizes and scales directly from pathology images. Different from previous methods that map individual spots to gene expression values, we generate a dense continuous gene expression map from the pathology image, and aggregate values within spots of interest to predict the gene expression. Our PixNet outperforms state-of-the-art methods on 3 common ST datasets, while showing superior performance in predicting gene expression across multiple spatial scales. The source code will be publicly available.