Identifying perturbation targets through causal differential networks

📅 2024-10-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Identifying intervention targets in single-cell biology—i.e., inferring the set of perturbed variables from combined observational and interventional data—remains challenging due to small sample sizes, high dimensionality, and violations of ideal causal assumptions. Method: We propose Causal Differential Networks (CDN), a novel end-to-end joint training framework that unifies noisy causal graph inference, graph-structural difference modeling, and multi-source feature supervision to jointly optimize for causal interpretability and prediction robustness. Contribution/Results: Evaluated on seven real single-cell transcriptomic datasets and diverse synthetic intervention scenarios, CDN consistently outperforms state-of-the-art baselines. It achieves substantial improvements in both soft and hard target prediction accuracy, offering an interpretable, high-precision computational paradigm for drug target discovery and cellular engineering.

Technology Category

Application Category

📝 Abstract
Identifying variables responsible for changes to a biological system enables applications in drug target discovery and cell engineering. Given a pair of observational and interventional datasets, the goal is to isolate the subset of observed variables that were the targets of the intervention. Directly applying causal discovery algorithms is challenging: the data may contain thousands of variables with as few as tens of samples per intervention, and biological systems do not adhere to classical causality assumptions. We propose a causality-inspired approach to address this practical setting. First, we infer noisy causal graphs from the observational and interventional data. Then, we learn to map the differences between these graphs, along with additional statistical features, to sets of variables that were intervened upon. Both modules are jointly trained in a supervised framework, on simulated and real data that reflect the nature of biological interventions. This approach consistently outperforms baselines for perturbation modeling on seven single-cell transcriptomics datasets. We also demonstrate significant improvements over current causal discovery methods for predicting soft and hard intervention targets across a variety of synthetic data.
Problem

Research questions and friction points this paper is trying to address.

Identify intervention targets in biological systems.
Improve causal discovery in high-dimensional biological data.
Enhance prediction of soft and hard intervention effects.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal graph inference
Supervised learning framework
Single-cell transcriptomics datasets
Menghua Wu
Menghua Wu
Massachusetts Institute of Technology
machine learning for biology
U
Umesh Padia
Department of Computer Science, Massachusetts Institute of Technology
S
Sean H. Murphy
Department of Computer Science, Massachusetts Institute of Technology
R
R. Barzilay
Department of Computer Science, Massachusetts Institute of Technology
T
T. Jaakkola
Department of Computer Science, Massachusetts Institute of Technology