🤖 AI Summary
Spatial transcriptomics (ST) data acquisition is costly, and fixed-grid sampling introduces redundancy in morphologically or biologically homogeneous regions, exacerbating data scarcity. To address this, we propose the first single-cell-guided reinforcement learning framework for active ST sampling. Our method leverages embeddings from a single-cell foundation model to encode prior biological knowledge, integrates spatial density modeling with multi-source data alignment, and enables intelligent, budget-constrained spot selection. Furthermore, we design a hybrid regression-retrieval prediction network coupled with cell-type-aware soft-label supervision to enhance gene expression reconstruction accuracy under extreme data scarcity (<10% sampled spots). Evaluated on three public ST datasets, our approach significantly outperforms existing methods in both sampling efficiency and reconstruction fidelity, achieving state-of-the-art performance across all benchmarks.
📝 Abstract
Spatial transcriptomics (ST) is an emerging technology that enables researchers to investigate the molecular relationships underlying tissue morphology. However, acquiring ST data remains prohibitively expensive, and traditional fixed-grid sampling strategies lead to redundant measurements of morphologically similar or biologically uninformative regions, thus resulting in scarce data that constrain current methods. The well-established single-cell sequencing field, however, could provide rich biological data as an effective auxiliary source to mitigate this limitation. To bridge these gaps, we introduce SCR2-ST, a unified framework that leverages single-cell prior knowledge to guide efficient data acquisition and accurate expression prediction. SCR2-ST integrates a single-cell guided reinforcement learning-based (SCRL) active sampling and a hybrid regression-retrieval prediction network SCR2Net. SCRL combines single-cell foundation model embeddings with spatial density information to construct biologically grounded reward signals, enabling selective acquisition of informative tissue regions under constrained sequencing budgets. SCR2Net then leverages the actively sampled data through a hybrid architecture combining regression-based modeling with retrieval-augmented inference, where a majority cell-type filtering mechanism suppresses noisy matches and retrieved expression profiles serve as soft labels for auxiliary supervision. We evaluated SCR2-ST on three public ST datasets, demonstrating SOTA performance in both sampling efficiency and prediction accuracy, particularly under low-budget scenarios. Code is publicly available at: https://github.com/hrlblab/SCR2ST