🤖 AI Summary
Spatial transcriptomics (ST) is prohibitively expensive, while predicting gene expression from H&E images faces two key challenges: neglecting gene co-expression structure and modeling discrete count data as continuous regression. To address these, we propose MSA-ST—a multi-scale autoregressive framework that, for the first time, formulates spatial gene expression prediction as a codebook-free discrete token sequence generation task. MSA-ST models inter-gene dependencies via hierarchical gene clustering, integrates histological features with spatial coordinate embeddings for conditional decoding, and employs a coarse-to-fine autoregressive strategy to ensure biological consistency. Evaluated on four cross-tissue Spatial Transcriptomics datasets, MSA-ST achieves significant improvements in both predictive accuracy and biological plausibility—e.g., enhanced correlation with ground-truth expression, improved spatial pattern fidelity, and better preservation of gene modules. Our approach establishes a new paradigm for cost-effective, high-fidelity spatial molecular mapping.
📝 Abstract
Spatial Transcriptomics (ST) offers spatially resolved gene expression but remains costly. Predicting expression directly from widely available Hematoxylin and Eosin (H&E) stained images presents a cost-effective alternative. However, most computational approaches (i) predict each gene independently, overlooking co-expression structure, and (ii) cast the task as continuous regression despite expression being discrete counts. This mismatch can yield biologically implausible outputs and complicate downstream analyses. We introduce GenAR, a multi-scale autoregressive framework that refines predictions from coarse to fine. GenAR clusters genes into hierarchical groups to expose cross-gene dependencies, models expression as codebook-free discrete token generation to directly predict raw counts, and conditions decoding on fused histological and spatial embeddings. From an information-theoretic perspective, the discrete formulation avoids log-induced biases and the coarse-to-fine factorization aligns with a principled conditional decomposition. Extensive experimental results on four Spatial Transcriptomics datasets across different tissue types demonstrate that GenAR achieves state-of-the-art performance, offering potential implications for precision medicine and cost-effective molecular profiling. Code is publicly available at https://github.com/oyjr/genar.