🤖 AI Summary
This study addresses the challenge of improving out-of-distribution (OOD) generalization for predicting transcriptional responses to genetic perturbations—specifically, unseen single- and double-gene perturbations and novel cell lines. To overcome the limited experimental coverage that constrains existing methods, we introduce, for the first time, multi-source biological knowledge graphs to guide OOD modeling, establishing a rigorous benchmark framework that enforces strict cross-perturbation-type and cross-cell-line generalization. Our method integrates graph neural networks for knowledge graph encoding, multi-relational heterogeneous graph aggregation, OOD-aware training, and an interpretable attention mechanism. Across all three OOD settings, TxPert achieves a mean R² improvement of 12.7% over state-of-the-art baselines. We publicly release both the new benchmark and the model implementation.
📝 Abstract
Accurately predicting cellular responses to genetic perturbations is essential for understanding disease mechanisms and designing effective therapies. Yet exhaustively exploring the space of possible perturbations (e.g., multi-gene perturbations or across tissues and cell types) is prohibitively expensive, motivating methods that can generalize to unseen conditions. In this work, we explore how knowledge graphs of gene-gene relationships can improve out-of-distribution (OOD) prediction across three challenging settings: unseen single perturbations; unseen double perturbations; and unseen cell lines. In particular, we present: (i) TxPert, a new state-of-the-art method that leverages multiple biological knowledge networks to predict transcriptional responses under OOD scenarios; (ii) an in-depth analysis demonstrating the impact of graphs, model architecture, and data on performance; and (iii) an expanded benchmarking framework that strengthens evaluation standards for perturbation modeling.