Interpretable Perturbation Modeling Through Biomedical Knowledge Graphs

📅 2025-12-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of quantitatively modeling small-molecule–induced perturbations in cellular gene expression to elucidate drug transcriptional mechanisms, predict off-target effects, and identify drug repurposing opportunities. To overcome the limitation of existing knowledge graph (KG) models—restricted to binary drug–disease associations—we propose the first KG-driven graph neural network for gene expression perturbation prediction. Specifically, we construct a heterogeneous graph by integrating PrimeKG++ and LINCS L1000 data, incorporate multimodal embeddings from MolFormerXL and BioBERT, and design a graph attention network (GAT) coupled with a differential expression gene (DEG) prediction head to enable interpretable, cell-line– and compound-specific prediction of expression changes across 978 landmark genes. Ablation studies confirm that KG structural information substantially improves performance. Our model significantly outperforms MLP baselines under both scaffold and random splits, demonstrating robust generalization and enhanced biological interpretability.

Technology Category

Application Category

📝 Abstract
Understanding how small molecules perturb gene expression is essential for uncovering drug mechanisms, predicting off-target effects, and identifying repurposing opportunities. While prior deep learning frameworks have integrated multimodal embeddings into biomedical knowledge graphs (BKGs) and further improved these representations through graph neural network message-passing paradigms, these models have been applied to tasks such as link prediction and binary drug-disease association, rather than the task of gene perturbation, which may unveil more about mechanistic transcriptomic effects. To address this gap, we construct a merged biomedical graph that integrates (i) PrimeKG++, an augmentation of PrimeKG containing semantically rich embeddings for nodes with (ii) LINCS L1000 drug and cell line nodes, initialized with multimodal embeddings from foundation models such as MolFormerXL and BioBERT. Using this heterogeneous graph, we train a graph attention network (GAT) with a downstream prediction head that learns the delta expression profile of over 978 landmark genes for a given drug-cell pair. Our results show that our framework outperforms MLP baselines for differentially expressed genes (DEG) -- which predict the delta expression given a concatenated embedding of drug features, target features, and baseline cell expression -- under the scaffold and random splits. Ablation experiments with edge shuffling and node feature randomization further demonstrate that the edges provided by biomedical KGs enhance perturbation-level prediction. More broadly, our framework provides a path toward mechanistic drug modeling: moving beyond binary drug-disease association tasks to granular transcriptional effects of therapeutic intervention.
Problem

Research questions and friction points this paper is trying to address.

Predicts drug-induced gene expression changes using biomedical knowledge graphs.
Models transcriptional effects of drugs beyond binary disease associations.
Integrates multimodal embeddings to enhance perturbation prediction accuracy.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph attention network predicts gene expression changes
Heterogeneous biomedical knowledge graph integrates multimodal embeddings
Framework moves beyond binary associations to mechanistic drug modeling
P
Pascal Passigan
Massachusetts Institute of Technology
Kevin Zhu
Kevin Zhu
PhD, Stanford University; Professor of Business+Technology, University of California, San Diego
ITdatae-commercesoftwaredigital transformation
A
Angelina Ning
Massachusetts Institute of Technology