What Makes a Representation Good for Single-Cell Perturbation Prediction?

📅 2026-05-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

210K/year
🤖 AI Summary
This work addresses the challenge in single-cell perturbation prediction that perturbation-specific signals are sparse and often obscured by dominant invariant expression structures, hindering existing methods from learning generalizable causal representations. To overcome this, the authors propose PerturbedVAE, a novel framework that explicitly disentangles perturbation-specific and invariant features for the first time, guided by identifiability theory to recover sparse perturbation effects. Built upon a variational autoencoder architecture, PerturbedVAE supports modeling of combinatorial perturbations and achieves substantially improved out-of-distribution prediction performance on standard benchmarks. Furthermore, the model reveals interpretable perturbation–response mechanisms, offering new insights into cellular response dynamics.
📝 Abstract
Single-cell perturbation modeling is fundamental for understanding and predicting cellular responses to genetic perturbations. However, existing approaches, from causal representation learning to foundation models, often struggle with an overlooked challenge: gene expression is dominated by perturbation-invariant information, while perturbation-specific signals are intrinsically sparse. As a result, learned representations either entangle invariant and perturbation-specific information, leading to spurious and non-generalizable predictors, or suppress perturbation-specific signals altogether, rendering them ineffective for prediction. To address this, we propose PerturbedVAE, a general framework designed to resolve this signal imbalance. The framework explicitly separates perturbation-specific information from dominant invariant structure and recovers causal representations to effectively utilize such information for prediction. We further provide an identifiability analysis that characterizes the conditions under which sparse perturbation effects can be reliably recovered, thereby clarifying how the framework can be concretely specified under such conditions. Empirically, PerturbedVAE achieves state-of-the-art performance on a widely used benchmark across multiple evaluation settings, yielding significant gains on out-of-distribution combinatorial predictions and uncovering interpretable perturbation-response programs.
Problem

Research questions and friction points this paper is trying to address.

single-cell perturbation prediction
perturbation-specific signals
invariant information
representation learning
signal sparsity
Innovation

Methods, ideas, or system contributions that make the work stand out.

PerturbedVAE
causal representation learning
perturbation-specific signal
single-cell perturbation prediction
identifiability analysis
W
Wenkang Jiang
Australian Institute for Machine Learning, Adelaide University, Australia
Yuhang Liu
Yuhang Liu
The University of Adelaide
Representation LearningLLMsLatent Variable ModelsResponsible AI
Y
Yichao Cai
Australian Institute for Machine Learning, Adelaide University, Australia
Erdun Gao
Erdun Gao
University of Adelaide
Causal Inference
J
Jiayi Dong
College of Computer Science and Artificial Intelligence, Fudan University, China
Ehsan Abbasnejad
Ehsan Abbasnejad
Assoc. Prof. Monash University
Machine learningResponsible machine learningVision and LanguageMachine ReasoningBayesian
Lina Yao
Lina Yao
Science Lead at CSIRO Data61 & Professor at University of New South Wales, Australia
Machine LearningReinforcement LearningRecommender SystemsLLM AgentBrain Computer Interface
J
Javen Qinfeng Shi
Australian Institute for Machine Learning, Adelaide University, Australia; Responsible AI Research Centre, Australia