OOD-GraphLLM: Graph Large Language Model for Out-of-Distribution Generalized Drug Synergy Prediction

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing methods for drug synergy prediction are constrained by in-distribution assumptions and struggle to generalize to out-of-distribution (O.O.D.) scenarios caused by the introduction of novel compounds with unseen molecular topologies. This work proposes DrugSyn-LLM, the first graph-based large language model framework explicitly designed for O.O.D. generalization in drug synergy prediction. By integrating graph neural networks with biomedical large language models, it jointly optimizes molecular graph structures and semantic representations. The framework innovatively employs a retrieval-augmented biomedical instruction fine-tuning strategy to align molecular topology with linguistic semantics, enabling interpretable language-based reasoning. Experimental results demonstrate that DrugSyn-LLM significantly improves prediction accuracy under O.O.D. settings. The code and an interactive web tool have been made publicly available.

📝 Abstract

Drug synergy prediction (DSP) aims to identify efficacious drug combinations under various cellular contexts with different targets. However, the continual emergence of novel compounds results in variations in molecular scaffolds and sizes, causing drug synergy data to exhibit out-of-distribution (O.O.D.) shifts with respect to topological structure. Existing works rely on in-distribution (I.D.) assumption, failing to handle the O.O.D. shifts. To solve this problem, we study out-of-distribution generalized drug synergy prediction through a graph large language model for the first time. Nevertheless, O.O.D. generalized DSP is highly non-trivial, posing several challenges: i) how to discover structurally relevant and irrelevant molecular representations with respect to cell targets; ii) how to find the optimal graph neural architectures that accurately calculate molecular representations; and iii) how to jointly leverage molecular structural and semantic information in LLMs. To address these challenges, we propose OOD-GraphLLM, a novel graphLLM framework which is able to accurately predict drug synergy under O.O.D. settings via jointly optimizing molecular graph representation and biomedical semantic language representations in a unified manner. Furthermore, we finetune DrugSyn-LLM, a biomedical LLM, and employ a retrieval-augmented biomedical instruction tuning strategy to align molecular topological information and molecular semantic information with language-based reasoning for O.O.D. generalized DSP. Both the source code (https://github.com/EkkoXiao/Bio-GraphLLM) and released model (https://mn.cs.tsinghua.edu.cn/bio-graphllm/) are publicly available, where users are allowed to download model resources and interactively use the system through a web interface.

Problem

Research questions and friction points this paper is trying to address.

out-of-distribution

drug synergy prediction

molecular representation

graph neural networks

distribution shift

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph Large Language Model

Out-of-Distribution Generalization

Drug Synergy Prediction