What Makes a Representation Good for Single-Cell Perturbation Prediction?

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

This work addresses the challenge in single-cell perturbation prediction that perturbation-specific signals are sparse and often obscured by dominant invariant expression structures, hindering existing methods from learning generalizable causal representations. To overcome this, the authors propose PerturbedVAE, a novel framework that explicitly disentangles perturbation-specific and invariant features for the first time, guided by identifiability theory to recover sparse perturbation effects. Built upon a variational autoencoder architecture, PerturbedVAE supports modeling of combinatorial perturbations and achieves substantially improved out-of-distribution prediction performance on standard benchmarks. Furthermore, the model reveals interpretable perturbation–response mechanisms, offering new insights into cellular response dynamics.

📝 Abstract

Single-cell perturbation modeling is fundamental for understanding and predicting cellular responses to genetic perturbations. However, existing approaches, from causal representation learning to foundation models, often struggle with an overlooked challenge: gene expression is dominated by perturbation-invariant information, while perturbation-specific signals are intrinsically sparse. As a result, learned representations either entangle invariant and perturbation-specific information, leading to spurious and non-generalizable predictors, or suppress perturbation-specific signals altogether, rendering them ineffective for prediction. To address this, we propose PerturbedVAE, a general framework designed to resolve this signal imbalance. The framework explicitly separates perturbation-specific information from dominant invariant structure and recovers causal representations to effectively utilize such information for prediction. We further provide an identifiability analysis that characterizes the conditions under which sparse perturbation effects can be reliably recovered, thereby clarifying how the framework can be concretely specified under such conditions. Empirically, PerturbedVAE achieves state-of-the-art performance on a widely used benchmark across multiple evaluation settings, yielding significant gains on out-of-distribution combinatorial predictions and uncovering interpretable perturbation-response programs.

Problem

Research questions and friction points this paper is trying to address.

single-cell perturbation prediction

perturbation-specific signals

invariant information

representation learning

signal sparsity

Innovation

Methods, ideas, or system contributions that make the work stand out.

PerturbedVAE

causal representation learning

perturbation-specific signal