TxPert: Leveraging Biochemical Relationships for Out-of-Distribution Transcriptomic Perturbation Prediction

📅 2025-05-20
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of improving out-of-distribution (OOD) generalization for predicting transcriptional responses to genetic perturbations—specifically, unseen single- and double-gene perturbations and novel cell lines. To overcome the limited experimental coverage that constrains existing methods, we introduce, for the first time, multi-source biological knowledge graphs to guide OOD modeling, establishing a rigorous benchmark framework that enforces strict cross-perturbation-type and cross-cell-line generalization. Our method integrates graph neural networks for knowledge graph encoding, multi-relational heterogeneous graph aggregation, OOD-aware training, and an interpretable attention mechanism. Across all three OOD settings, TxPert achieves a mean R² improvement of 12.7% over state-of-the-art baselines. We publicly release both the new benchmark and the model implementation.

Technology Category

Application Category

📝 Abstract
Accurately predicting cellular responses to genetic perturbations is essential for understanding disease mechanisms and designing effective therapies. Yet exhaustively exploring the space of possible perturbations (e.g., multi-gene perturbations or across tissues and cell types) is prohibitively expensive, motivating methods that can generalize to unseen conditions. In this work, we explore how knowledge graphs of gene-gene relationships can improve out-of-distribution (OOD) prediction across three challenging settings: unseen single perturbations; unseen double perturbations; and unseen cell lines. In particular, we present: (i) TxPert, a new state-of-the-art method that leverages multiple biological knowledge networks to predict transcriptional responses under OOD scenarios; (ii) an in-depth analysis demonstrating the impact of graphs, model architecture, and data on performance; and (iii) an expanded benchmarking framework that strengthens evaluation standards for perturbation modeling.
Problem

Research questions and friction points this paper is trying to address.

Predicting cellular responses to unseen genetic perturbations
Improving out-of-distribution prediction using gene-gene knowledge graphs
Enhancing perturbation modeling evaluation standards
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages gene-gene knowledge graphs
Predicts OOD transcriptional responses
Uses multiple biological knowledge networks
🔎 Similar Papers
No similar papers found.
Frederik Wenkel
Frederik Wenkel
Valence Labs
AI for Drug Discovery
W
Wilson Tu
Valence Labs, Montréal, QC, Canada; Recursion, Salt Lake City, UT, USA
C
Cassandra Masschelein
Valence Labs, Montréal, QC, Canada; Recursion, Salt Lake City, UT, USA
Hamed Shirzad
Hamed Shirzad
Ph.D. Student, Computer Science Department, UBC
Graph Representation LearningMachine Learning on GraphsGraph Generative Models
Cian Eastwood
Cian Eastwood
University of Edinburgh
Machine LearningCausalityRepresentation Learning
S
Shawn T. Whitfield
Valence Labs, Montréal, QC, Canada; Recursion, Salt Lake City, UT, USA
I
Ihab Bendidi
Valence Labs, Montréal, QC, Canada; Recursion, Salt Lake City, UT, USA
Craig Russell
Craig Russell
Valence Labs, Montréal, QC, Canada; Recursion, Salt Lake City, UT, USA
L
Liam Hodgson
Valence Labs, Montréal, QC, Canada; Recursion, Salt Lake City, UT, USA
Y
Yassir El Mesbahi
Valence Labs, Montréal, QC, Canada; Recursion, Salt Lake City, UT, USA
J
Jiarui Ding
Computer Science, University of British Columbia, Vancouver, BC, Canada
M
Marta M. Fay
Recursion, Salt Lake City, UT, USA
B
Berton Earnshaw
Valence Labs, Montréal, QC, Canada; Recursion, Salt Lake City, UT, USA
Emmanuel Noutahi
Emmanuel Noutahi
Valence Labs
representation learninggenerative modelsdrug designgenome evolutionmolecular optimization
Alisandra K. Denton
Alisandra K. Denton
Valence Labs, QC, Canada
bioinformaticsdeep learningbiologymultimodalityperturbation analysis