Comparative Performance Evaluation of Large Language Models for Extracting Molecular Interactions and Pathway Knowledge

📅 2023-07-17
🏛️ arXiv.org
📈 Citations: 10
Influential: 0
📄 PDF
🤖 AI Summary
Traditional biomedical knowledge extraction relies heavily on manual curation, resulting in low scalability and efficiency. Method: This study conducts the first systematic evaluation of large language models (LLMs) for genome-scale molecular interaction and pathway knowledge extraction. We integrate BioBERT, LLaMA, and ChatGLM with prompt engineering and supervise fine-tuning using gold-standard databases—including STRING, KEGG, and Reactome—alongside zero-shot inference. Results: Large models significantly outperform smaller ones, achieving moderate performance (F1 ≈ 0.62) on protein–protein interaction identification, radiation-response pathway gene discovery, and gene regulatory relationship parsing. However, critical bottlenecks persist in identifying functionally heterogeneous gene clusters and modeling strongly correlated regulatory relationships. This work provides empirical evidence and methodological guidance for AI-driven, scalable, and automated biological knowledge discovery.
📝 Abstract
Background Identification of the interactions and regulatory relations between biomolecules play pivotal roles in understanding complex biological systems and the mechanisms underlying diverse biological functions. However, the collection of such molecular interactions has heavily relied on expert curation in the past, making it labor-intensive and time-consuming. To mitigate these challenges, we propose leveraging the capabilities of large language models (LLMs) to automate genome-scale extraction of this crucial knowledge. Results In this study, we investigate the efficacy of various LLMs in addressing biological tasks, such as the recognition of protein interactions, identification of genes linked to pathways affected by low-dose radiation, and the delineation of gene regulatory relationships. Overall, the larger models exhibited superior performance, indicating their potential for specific tasks that involve the extraction of complex interactions among genes and proteins. Although these models possessed detailed information for distinct gene and protein groups, they faced challenges in identifying groups with diverse functions and in recognizing highly correlated gene regulatory relationships. Conclusions By conducting a comprehensive assessment of the state-of-the-art models using well-established molecular interaction and pathway databases, our study reveals that LLMs can identify genes/proteins associated with pathways of interest and predict their interactions to a certain extent. Furthermore, these models can provide important insights, marking a noteworthy stride toward advancing our understanding of biological systems through AI-assisted knowledge discovery.
Problem

Research questions and friction points this paper is trying to address.

Automate genome-scale extraction of molecular interactions using LLMs
Evaluate LLM performance in recognizing protein interactions and gene pathways
Assess LLM capabilities in predicting gene regulatory relationships
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging large language models for genome-scale extraction
Evaluating LLMs on protein interaction recognition
Assessing gene regulatory relationship prediction accuracy
🔎 Similar Papers
No similar papers found.
Gilchan Park
Gilchan Park
Brookhaven National Laboratory
Natural Language ProcessingMachine LearningOntological Semantics
Byung-Jun Yoon
Byung-Jun Yoon
Texas A&M University, Brookhaven National Laboratory
Optimal Experimental DesignAI for ScienceBioinformaticsComputational Network Biology
X
Xihaier Luo
Computational Science Initiative, Brookhaven National Laboratory, PO Box 5000, Upton, 11973, NY, USA.
V
Vanessa L'opez-Marrero
Computational Science Initiative, Brookhaven National Laboratory, PO Box 5000, Upton, 11973, NY, USA.
Shinjae Yoo
Shinjae Yoo
Brookhaven National Lab
Machine Learning
Patrick Johnstone
Patrick Johnstone
F
Francis J. Alexander