π€ AI Summary
Single-cell perturbation inference faces a fundamental challenge: destructive sequencing prevents paired measurements of pre- and post-perturbation states, forcing existing methods to either enforce artificial pairings or neglect intrinsic relationships between perturbed and unperturbed cells. To address this, we propose Unlastingβa novel framework that introduces the first dual-conditional diffusion implicit bridge (DDIB) for unpaired perturbation response modeling. Unlasting jointly incorporates a gene regulatory network (GRN) to guide perturbation signal propagation at the gene level and employs a masking mechanism to explicitly model gene silencing. Furthermore, we design a bimodal-aware evaluation metric grounded in distribution heterogeneity, balancing generative fidelity and biological interpretability. Across multiple real-world datasets, Unlasting achieves significantly improved prediction accuracy, better preserves cellular heterogeneity, and yields functionally more consistent results.
π Abstract
Estimating single-cell responses across various perturbations facilitates the identification of key genes and enhances drug screening, significantly boosting experimental efficiency. However, single-cell sequencing is a destructive process, making it impossible to capture the same cell's phenotype before and after perturbation. Consequently, data collected under perturbed and unperturbed conditions are inherently unpaired. Existing methods either attempt to forcibly pair unpaired data using random sampling, or neglect the inherent relationship between unperturbed and perturbed cells during the modeling. In this work, we propose a framework based on Dual Diffusion Implicit Bridges (DDIB) to learn the mapping between different data distributions, effectively addressing the challenge of unpaired data. We further interpret this framework as a form of data augmentation. We integrate gene regulatory network (GRN) information to propagate perturbation signals in a biologically meaningful way, and further incorporate a masking mechanism to predict silent genes, improving the quality of generated profiles. Moreover, gene expression under the same perturbation often varies significantly across cells, frequently exhibiting a bimodal distribution that reflects intrinsic heterogeneity. To capture this, we introduce a more suitable evaluation metric. We propose Unlasting, dual conditional diffusion models that overcome the problem of unpaired single-cell perturbation data and strengthen the model's insight into perturbations under the guidance of the GRN, with a dedicated mask model designed to improve generation quality by predicting silent genes. In addition, we introduce a biologically grounded evaluation metric that better reflects the inherent heterogeneity in single-cell responses.