Modeling Gene Expression Distributional Shifts for Unseen Genetic Perturbations

📅 2025-07-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In early drug discovery, existing gene perturbation prediction methods model only mean expression levels, failing to capture cellular heterogeneity. This work introduces the first deep learning framework capable of predicting the full single-cell gene expression distribution—including variance, skewness, and kurtosis. Methodologically, it innovatively adopts gene-level histograms as output targets and integrates large language model–derived gene embeddings as biologically informed priors to enable generalization to unseen perturbations. Experiments demonstrate that our model significantly outperforms baselines in distributional modeling (−12.7% KL divergence), reduces training cost by 35%, and maintains state-of-the-art accuracy in mean expression prediction. By enabling high-fidelity, distribution-aware perturbation response modeling, this work establishes a more realistic and robust paradigm for target identification and functional interpretation in perturbation biology.

Technology Category

Application Category

📝 Abstract
We train a neural network to predict distributional responses in gene expression following genetic perturbations. This is an essential task in early-stage drug discovery, where such responses can offer insights into gene function and inform target identification. Existing methods only predict changes in the mean expression, overlooking stochasticity inherent in single-cell data. In contrast, we offer a more realistic view of cellular responses by modeling expression distributions. Our model predicts gene-level histograms conditioned on perturbations and outperforms baselines in capturing higher-order statistics, such as variance, skewness, and kurtosis, at a fraction of the training cost. To generalize to unseen perturbations, we incorporate prior knowledge via gene embeddings from large language models (LLMs). While modeling a richer output space, the method remains competitive in predicting mean expression changes. This work offers a practical step towards more expressive and biologically informative models of perturbation effects.
Problem

Research questions and friction points this paper is trying to address.

Predict gene expression distribution shifts post genetic perturbations
Overcome limitations of mean-only prediction in single-cell data
Generalize to unseen perturbations using gene embeddings from LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural network predicts gene expression distributions
Incorporates gene embeddings from large language models
Captures higher-order statistics at low cost
🔎 Similar Papers
No similar papers found.
K
Kalyan Ramakrishnan
University of Oxford
J
Jonathan G. Hedley
University of Oxford
S
Sisi Qu
University of Oxford
Puneet K. Dokania
Puneet K. Dokania
University of Oxford | Bosch (Five AI)
Deep Learning
P
Philip H. S. Torr
University of Oxford
C
Cesar A. Prada-Medina
Novo Nordisk
Julien Fauqueur
Julien Fauqueur
Novo Nordisk
AIDrug DiscoveryNLPComputer Vision
K
Kaspar Martens
Novo Nordisk