Latent Causal Diffusions for Single-Cell Perturbation Modeling

📅 2026-01-20
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods struggle to accurately model single-cell transcriptomic responses to perturbations, often conflating measurement noise with true biological signals and failing to uncover the underlying causal regulatory architecture. To address these limitations, this work proposes the LCD-CLIPR framework, which formulates single-cell gene expression as a noise-perturbed stationary diffusion process. By integrating a Latent Causal Diffusion (LCD) model with a Causal Linearization for Perturbation Response (CLIPR) approach, the method embeds stochastic differential equations and causal identification theory into generative modeling. This represents the first successful integration of generative diffusion models with causal inference, enabling accurate prediction of expression distributions under unseen perturbation combinations. Evaluated on both simulated and real single-cell RNA-seq perturbation datasets, LCD-CLIPR substantially outperforms existing methods, recovering genome-wide direct causal regulatory relationships and revealing functional modules and regulatory mechanisms undetectable by conventional differential expression analysis.

Technology Category

Application Category

📝 Abstract
Perturbation screens hold the potential to systematically map regulatory processes at single-cell resolution, yet modeling and predicting transcriptome-wide responses to perturbations remains a major computational challenge. Existing methods often underperform simple baselines, fail to disentangle measurement noise from biological signal, and provide limited insight into the causal structure governing cellular responses. Here, we present the latent causal diffusion (LCD), a generative model that frames single-cell gene expression as a stationary diffusion process observed under measurement noise. LCD outperforms established approaches in predicting the distributional shifts of unseen perturbation combinations in single-cell RNA-sequencing screens while simultaneously learning a mechanistic dynamical system of gene regulation. To interpret these learned dynamics, we develop an approach we call causal linearization via perturbation responses (CLIPR), which yields an approximation of the direct causal effects between all genes modeled by the diffusion. CLIPR provably identifies causal effects under a linear drift assumption and recovers causal structure in both simulated systems and a genome-wide perturbation screen, where it clusters genes into coherent functional modules and resolves causal relationships that standard differential expression analysis cannot. The LCD-CLIPR framework bridges generative modeling with causal inference to predict unseen perturbation effects and map the underlying regulatory mechanisms of the transcriptome.
Problem

Research questions and friction points this paper is trying to address.

single-cell perturbation
causal inference
transcriptome modeling
gene regulation
distributional shift
Innovation

Methods, ideas, or system contributions that make the work stand out.

latent causal diffusion
causal inference
single-cell perturbation
generative modeling
gene regulatory dynamics