🤖 AI Summary
Conventional drug–drug interaction (DDI) prediction methods suffer from poor generalizability and difficulty in integrating heterogeneous biomedical knowledge. Method: This study pioneers a systematic investigation of large language models (LLMs) for zero-shot and fine-tuned DDI prediction. We propose a novel paradigm that unifies multi-source biomedical knowledge—including SMILES strings, protein targets, genes, and signaling pathways—into standardized textual representations, and construct a benchmark DDI text dataset derived from DrugBank. Evaluation spans 18 state-of-the-art LLMs (e.g., GPT-4, Phi-3.5, Qwen-2.5) across 13 external validation datasets. Contribution/Results: Fine-tuning Phi-3.5 (2.7B parameters) achieves a sensitivity of 0.978 and balanced accuracy of 0.919—surpassing both zero-shot LLMs and traditional machine learning baselines—and establishes a new SOTA for DDI prediction. Results demonstrate that LLMs can implicitly capture multi-level drug–gene–pathway interactions, validating their potential as expressive, knowledge-integrated predictors.
📝 Abstract
The increasing volume of drug combinations in modern therapeutic regimens needs reliable methods for predicting drug-drug interactions (DDIs). While Large Language Models (LLMs) have revolutionized various domains, their potential in pharmaceutical research, particularly in DDI prediction, remains largely unexplored. This study thoroughly investigates LLMs' capabilities in predicting DDIs by uniquely processing molecular structures (SMILES), target organisms, and gene interaction data as raw text input from the latest DrugBank dataset. We evaluated 18 different LLMs, including proprietary models (GPT-4, Claude, Gemini) and open-source variants (from 1.5B to 72B parameters), first assessing their zero-shot capabilities in DDI prediction. We then fine-tuned selected models (GPT-4, Phi-3.5 2.7B, Qwen-2.5 3B, Gemma-2 9B, and Deepseek R1 distilled Qwen 1.5B) to optimize their performance. Our comprehensive evaluation framework included validation across 13 external DDI datasets, comparing against traditional approaches such as l2-regularized logistic regression. Fine-tuned LLMs demonstrated superior performance, with Phi-3.5 2.7B achieving a sensitivity of 0.978 in DDI prediction, with an accuracy of 0.919 on balanced datasets (50% positive, 50% negative cases). This result represents an improvement over both zero-shot predictions and state-of-the-art machine-learning methods used for DDI prediction. Our analysis reveals that LLMs can effectively capture complex molecular interaction patterns and cases where drug pairs target common genes, making them valuable tools for practical applications in pharmaceutical research and clinical settings.