🤖 AI Summary
This study addresses key challenges in integrating multi-source heterogeneous perturbation data, weak mechanistic generalization, and limited cross-type knowledge transfer. We propose the Large Perturbation Model (LPM), the first framework to introduce a three-dimensional disentangled representation—“perturbation–readout–context”—enabling unified modeling of chemical and genetic perturbations and zero-shot response prediction. LPM integrates multimodal deep embedding, contrastive learning, and conditional generative modeling to achieve accurate transcriptomic prediction, cross-experiment mechanistic pattern discovery, and gene interaction inference. On benchmark tasks, LPM significantly outperforms state-of-the-art methods: it improves R² by 18% for unseen experimental transcriptomic prediction, increases chemical–genetic mechanism matching accuracy by 23%, and achieves an AUPRC of 0.41 for gene regulatory network inference.
📝 Abstract
Data generated in perturbation experiments link perturbations to the changes they elicit and therefore contain information relevant to numerous biological discovery tasks -- from understanding the relationships between biological entities to developing therapeutics. However, these data encompass diverse perturbations and readouts, and the complex dependence of experimental outcomes on their biological context makes it challenging to integrate insights across experiments. Here, we present the Large Perturbation Model (LPM), a deep-learning model that integrates multiple, heterogeneous perturbation experiments by representing perturbation, readout, and context as disentangled dimensions. LPM outperforms existing methods across multiple biological discovery tasks, including in predicting post-perturbation transcriptomes of unseen experiments, identifying shared molecular mechanisms of action between chemical and genetic perturbations, and facilitating the inference of gene-gene interaction networks.