🤖 AI Summary
Existing single-cell generative methods often overlook higher-order biological relationships among genes, limiting their ability to model and generalize across multiple experimental conditions. To address this, this work proposes SAVE, a novel framework that introduces gene module attention for the first time. SAVE employs a conditional Transformer to aggregate semantically related genes into functional modules and integrates flow matching with a conditional masking strategy to unify multi-condition single-cell expression modeling. This approach enables extrapolative generation for unseen condition combinations, significantly enhancing generalization under low-resource and combinatorial holdout settings. Experimental results demonstrate that SAVE outperforms state-of-the-art methods in conditional generation, batch correction, and perturbation prediction, achieving markedly improved generation fidelity and extrapolation performance.
📝 Abstract
Modeling single-cell gene expression across diverse biological and technical conditions is crucial for characterizing cellular states and simulating unseen scenarios. Existing methods often treat genes as independent tokens, overlooking their high-level biological relationships and leading to poor performance. We introduce SAVE, a unified generative framework based on conditional Transformers for multi-condition single-cell modeling. SAVE leverages a coarse-grained representation by grouping semantically related genes into blocks, capturing higher-order dependencies among gene modules. A Flow Matching mechanism and condition-masking strategy further enhance flexible simulation and enable generalization to unseen condition combinations. We evaluate SAVE on a range of benchmarks, including conditional generation, batch effect correction, and perturbation prediction. SAVE consistently outperforms state-of-the-art methods in generation fidelity and extrapolative generalization, especially in low-resource or combinatorially held-out settings. Overall, SAVE offers a scalable and generalizable solution for modeling complex single-cell data, with broad utility in virtual cell synthesis and biological interpretation. Our code is publicly available at https://github.com/fdu-wangfeilab/sc-save