GenAR: Next-Scale Autoregressive Generation for Spatial Gene Expression Prediction

📅 2025-10-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Spatial transcriptomics (ST) is prohibitively expensive, while predicting gene expression from H&E images faces two key challenges: neglecting gene co-expression structure and modeling discrete count data as continuous regression. To address these, we propose MSA-ST—a multi-scale autoregressive framework that, for the first time, formulates spatial gene expression prediction as a codebook-free discrete token sequence generation task. MSA-ST models inter-gene dependencies via hierarchical gene clustering, integrates histological features with spatial coordinate embeddings for conditional decoding, and employs a coarse-to-fine autoregressive strategy to ensure biological consistency. Evaluated on four cross-tissue Spatial Transcriptomics datasets, MSA-ST achieves significant improvements in both predictive accuracy and biological plausibility—e.g., enhanced correlation with ground-truth expression, improved spatial pattern fidelity, and better preservation of gene modules. Our approach establishes a new paradigm for cost-effective, high-fidelity spatial molecular mapping.

Technology Category

Application Category

📝 Abstract
Spatial Transcriptomics (ST) offers spatially resolved gene expression but remains costly. Predicting expression directly from widely available Hematoxylin and Eosin (H&E) stained images presents a cost-effective alternative. However, most computational approaches (i) predict each gene independently, overlooking co-expression structure, and (ii) cast the task as continuous regression despite expression being discrete counts. This mismatch can yield biologically implausible outputs and complicate downstream analyses. We introduce GenAR, a multi-scale autoregressive framework that refines predictions from coarse to fine. GenAR clusters genes into hierarchical groups to expose cross-gene dependencies, models expression as codebook-free discrete token generation to directly predict raw counts, and conditions decoding on fused histological and spatial embeddings. From an information-theoretic perspective, the discrete formulation avoids log-induced biases and the coarse-to-fine factorization aligns with a principled conditional decomposition. Extensive experimental results on four Spatial Transcriptomics datasets across different tissue types demonstrate that GenAR achieves state-of-the-art performance, offering potential implications for precision medicine and cost-effective molecular profiling. Code is publicly available at https://github.com/oyjr/genar.
Problem

Research questions and friction points this paper is trying to address.

Predicting spatial gene expression from H&E images cost-effectively
Modeling gene dependencies and co-expression structures jointly
Generating discrete count outputs instead of continuous regression
Innovation

Methods, ideas, or system contributions that make the work stand out.

Autoregressive framework refines predictions from coarse to fine
Clusters genes hierarchically to model cross-gene dependencies
Models expression as discrete token generation for raw counts
🔎 Similar Papers
No similar papers found.
J
Jiarui Ouyang
The Hong Kong University of Science and Technology
Y
Yihui Wang
The Hong Kong University of Science and Technology
Yihang Gao
Yihang Gao
Research Fellow, National University of Singapore
OptimizationMachine LearningTensor ComputationLarge Language Models
Yingxue Xu
Yingxue Xu
The Hong Kong University of Science and Technology
Multimodal LearningSurvival AnalysisComputational Pathology
S
Shu Yang
The Hong Kong University of Science and Technology
H
Hao Chen
The Hong Kong University of Science and Technology