STRAND: Sequence-Conditioned Transport for Single-Cell Perturbations

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing gene-level models struggle to capture the heterogeneous transcriptional responses elicited by perturbations at distinct genomic loci, limiting both mechanistic modeling of gene regulation and zero-shot generalization. This work proposes a novel approach that elevates perturbation modeling from gene identifiers to regulatory DNA sequences. By integrating sequence encoding, conditional optimal transport, and single-cell CRISPR perturbation data, the authors develop a locus-specific generative model capable of zero-shot prediction across approximately 95% of the genome. The method achieves a 33% improvement in discriminative performance under low-data regimes, sets a new state-of-the-art on benchmarks involving unseen gene perturbations, and demonstrates up to a 0.14 increase in Pearson correlation coefficient in cross-cell-line transfer tasks. Notably, it uncovers functional differences at transcription start sites that are overlooked by conventional models.

Technology Category

Application Category

📝 Abstract
Predicting how genetic perturbations change cellular state is a core problem for building controllable models of gene regulation. Perturbations targeting the same gene can produce different transcriptional responses depending on their genomic locus, including different transcription start sites and regulatory elements. Gene-level perturbation models collapse these distinct interventions into the same representation. We introduce STRAND, a generative model that predicts single-cell transcriptional responses by conditioning on regulatory DNA sequence. STRAND represents a perturbation by encoding the sequence at its genomic locus and uses this representation to parameterize a conditional transport process from control to perturbed cell states. Representing perturbations by sequence, rather than by a fixed set of gene identifiers, supports zero-shot inference at loci not seen during training and expands inference-time genomic coverage from ~1.5% for gene-level single-cell foundation models to ~95% of the genome. We evaluate STRAND on CRISPR perturbation datasets in K562, Jurkat, and RPE1 cells. STRAND improves discrimination scores by up to 33% in low-sample regimes, achieves the best average rank on unseen gene perturbation benchmarks, and improves transfer to novel cell lines by up to 0.14 in Pearson correlation. Ablations isolate the gains to sequence conditioning and transport, and case studies show that STRAND resolves functionally alternative transcription start sites missed by gene-level models.
Problem

Research questions and friction points this paper is trying to address.

genetic perturbations
single-cell transcriptional responses
regulatory DNA sequence
gene-level models
genomic locus
Innovation

Methods, ideas, or system contributions that make the work stand out.

sequence-conditioned transport
single-cell perturbation
zero-shot genomic inference
generative modeling
regulatory DNA sequence
🔎 Similar Papers
No similar papers found.