LLMs for Drug-Drug Interaction Prediction: A Comprehensive Comparison

📅 2025-02-09

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Conventional drug–drug interaction (DDI) prediction methods suffer from poor generalizability and difficulty in integrating heterogeneous biomedical knowledge. Method: This study pioneers a systematic investigation of large language models (LLMs) for zero-shot and fine-tuned DDI prediction. We propose a novel paradigm that unifies multi-source biomedical knowledge—including SMILES strings, protein targets, genes, and signaling pathways—into standardized textual representations, and construct a benchmark DDI text dataset derived from DrugBank. Evaluation spans 18 state-of-the-art LLMs (e.g., GPT-4, Phi-3.5, Qwen-2.5) across 13 external validation datasets. Contribution/Results: Fine-tuning Phi-3.5 (2.7B parameters) achieves a sensitivity of 0.978 and balanced accuracy of 0.919—surpassing both zero-shot LLMs and traditional machine learning baselines—and establishes a new SOTA for DDI prediction. Results demonstrate that LLMs can implicitly capture multi-level drug–gene–pathway interactions, validating their potential as expressive, knowledge-integrated predictors.

Technology Category

Application Category

📝 Abstract

The increasing volume of drug combinations in modern therapeutic regimens needs reliable methods for predicting drug-drug interactions (DDIs). While Large Language Models (LLMs) have revolutionized various domains, their potential in pharmaceutical research, particularly in DDI prediction, remains largely unexplored. This study thoroughly investigates LLMs' capabilities in predicting DDIs by uniquely processing molecular structures (SMILES), target organisms, and gene interaction data as raw text input from the latest DrugBank dataset. We evaluated 18 different LLMs, including proprietary models (GPT-4, Claude, Gemini) and open-source variants (from 1.5B to 72B parameters), first assessing their zero-shot capabilities in DDI prediction. We then fine-tuned selected models (GPT-4, Phi-3.5 2.7B, Qwen-2.5 3B, Gemma-2 9B, and Deepseek R1 distilled Qwen 1.5B) to optimize their performance. Our comprehensive evaluation framework included validation across 13 external DDI datasets, comparing against traditional approaches such as l2-regularized logistic regression. Fine-tuned LLMs demonstrated superior performance, with Phi-3.5 2.7B achieving a sensitivity of 0.978 in DDI prediction, with an accuracy of 0.919 on balanced datasets (50% positive, 50% negative cases). This result represents an improvement over both zero-shot predictions and state-of-the-art machine-learning methods used for DDI prediction. Our analysis reveals that LLMs can effectively capture complex molecular interaction patterns and cases where drug pairs target common genes, making them valuable tools for practical applications in pharmaceutical research and clinical settings.

Problem

Research questions and friction points this paper is trying to address.

LLMs predict drug-drug interactions

Evaluate LLMs using molecular structures

Fine-tuned LLMs outperform traditional methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs process molecular structures as text

Fine-tuned LLMs outperform traditional methods

LLMs capture complex drug interaction patterns

🔎 Similar Papers

No similar papers found.