Large Language Models for Bioinformatics

📅 2025-01-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Bioinformatics large language models (BioLMs) face fundamental challenges—including unclear distinctions from general-purpose LLMs, lack of standardized evaluation frameworks, and poor clinical applicability. Method: We propose a comprehensive analytical framework featuring a four-dimensional biomedical evaluation schema (privacy/security, interpretability, data/result bias, and cross-domain generalization). Leveraging Transformer architectures, we integrate domain-adaptive pretraining, multimodal sequence modeling, knowledge-enhanced fine-tuning, and clinical semantic alignment across heterogeneous data sources (genomic, proteomic, literature, and electronic health records). Contribution/Results: We systematically survey over 120 BioLMs, establish a unified taxonomy and benchmark suite, identify five persistent bottlenecks, and articulate an evolutionary pathway toward trustworthy, interpretable, and clinically ready BioAI—providing foundational methodology for next-generation biomedical AI.

Technology Category

Application Category

📝 Abstract
With the rapid advancements in large language model (LLM) technology and the emergence of bioinformatics-specific language models (BioLMs), there is a growing need for a comprehensive analysis of the current landscape, computational characteristics, and diverse applications. This survey aims to address this need by providing a thorough review of BioLMs, focusing on their evolution, classification, and distinguishing features, alongside a detailed examination of training methodologies, datasets, and evaluation frameworks. We explore the wide-ranging applications of BioLMs in critical areas such as disease diagnosis, drug discovery, and vaccine development, highlighting their impact and transformative potential in bioinformatics. We identify key challenges and limitations inherent in BioLMs, including data privacy and security concerns, interpretability issues, biases in training data and model outputs, and domain adaptation complexities. Finally, we highlight emerging trends and future directions, offering valuable insights to guide researchers and clinicians toward advancing BioLMs for increasingly sophisticated biological and clinical applications.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Bioinformatics
Challenges and Applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Biological Informatics
Large Language Models
Disease Diagnosis and Drug Discovery
🔎 Similar Papers
No similar papers found.
Wei Ruan
Wei Ruan
University of Georgia
Yanjun Lyu
Yanjun Lyu
PhD Student of Computer Science, University of Texas at Arlington
J
Jing Zhang
Department of Computer Science and Engineering, University of Texas at Arlington, TX, USA
Jiazhang Cai
Jiazhang Cai
Graduate Student of Statistics, University of Georgia
StatisticsBioinformatics
P
Peng Shu
School of Computing, University of Georgia, GA, USA
Y
Yang Ge
Department of Epidemiology and Biostatistics, University of Georgia, Athens, GA, USA
Y
Yao Lu
Department of Epidemiology and Biostatistics, University of Georgia, Athens, GA, USA
S
Shang Gao
Institute of Plant Breeding, Genetics & Genomics, University of Georgia, Athens, GA, USA
Y
Yue Wang
School of Computing, University of Georgia, GA, USA
Peilong Wang
Peilong Wang
City of Hope
PhysicsAIImaging
L
Lin Zhao
School of Computing, University of Georgia, GA, USA
T
Tao Wang
Department of Statistics, University of Georgia, GA, USA
Y
Yufang Liu
Department of Statistics, University of Georgia, GA, USA
Luyang Fang
Luyang Fang
Ph.D. student of Statistics, University of Georgia
statisticsdeep learning (LLM)nonparametricbioinformatics
Z
Ziyu Liu
Department of Statistics, University of Georgia, GA, USA
Zhengliang Liu
Zhengliang Liu
University of Georgia
Natural Language ProcessingMedical NLPMedical Image AnalysisData Visualization
Y
Yiwei Li
School of Computing, University of Georgia, GA, USA
Zihao Wu
Zihao Wu
University of Georgia
Brain-inspired AIArtificial General IntelligenceNLPMedical Image Analysis
J
Junhao Chen
School of Computing, University of Georgia, GA, USA
Hanqi Jiang
Hanqi Jiang
University of Georgia
Medical Image AnalysisMulti-modal Large Language Models
Y
Yi Pan
School of Computing, University of Georgia, GA, USA
Z
Zhenyuan Yang
School of Computing, University of Georgia, GA, USA
J
Jingyuan Chen
Department of Radiation Oncology, Mayo Clinic, Phoenix, AZ, USA
S
Shizhe Liang
Institute of Plant Breeding, Genetics & Genomics, University of Georgia, Athens, GA, USA
W
Wei Zhang
School of Computer and Cyber Sciences, Augusta University, GA, USA
T
Terry Ma
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
Y
Yuan Dou
College of Engineering, University of Georgia, Athens, GA, USA
J
Jianli Zhang
College of Engineering, University of Georgia, Athens, GA, USA
Xinyu Gong
Xinyu Gong
TikTok
Computer vision
Q
Qi Gan
College of Engineering, University of Georgia, Athens, GA, USA
Y
Yusong Zou
College of Engineering, University of Georgia, Athens, GA, USA
Z
Zebang Chen
College of Engineering, University of Georgia, Athens, GA, USA
Y
Yuanxin Qian
College of Engineering, University of Georgia, Athens, GA, USA
S
Shuo Yu
College of Engineering, University of Georgia, Athens, GA, USA
J
Jin Lu
School of Computing, University of Georgia, GA, USA
Kenan Song
Kenan Song
NU, MIT, ASU, UGA
1d textile2d coating3d printingmaterials-manufacturing-mechanics
Xianqiao Wang
Xianqiao Wang
Professor, College of Engineering, Mechanical Engineering, University of Georgia
Computational mechanicsBrain mechanicsNanomechanicsCell-NP InteractionsSoft Matters
Andrea Sikora
Andrea Sikora
Clinical Associate Professor, The University of Georgia College of Pharmacy
@AndreaSikorapharmacycritical carecardiologyacute respiratory failure
G
Gang Li
Department of Radiology, University of North Carolina at Chapel Hill, NC, USA
X
Xiang Li
Department of Radiology, Massachusetts General Hospital and Harvard Medical School, MA, USA
Quanzheng Li
Quanzheng Li
Massachusetts General Hospital, Harvard Medical School
Image ReconstructionMedical Image AnalysisDeep Learning in MedicineMultimodality Medical Data Analysis
Y
Yingfeng Wang
Department of Computer Science and Engineering, University of Tennessee at Chattanooga, TN, USA
L
Lu Zhang
Department of Computer Science, Indiana University Indianapolis, IN, USA
Y
Yohannes Abate
Department of Physics and Astronomy, University of Georgia, Athens, GA, USA
Lifang He
Lifang He
Associate Professor of Computer Science, Lehigh University
Machine LearningAI for HealthMedical ImagingBiomedical InformaticsTensor Analysis
Wenxuan Zhong
Wenxuan Zhong
Professor of Statistics, University of Georgia
Dimension ReductionMetagenomicsBrain Imaging Analysis
R
Rongjie Liu
Department of Statistics, University of Georgia, GA, USA
C
Chao Huang
Department of Epidemiology and Biostatistics, University of Georgia, Athens, GA, USA
W
Wei Liu
Department of Radiation Oncology, Mayo Clinic, Phoenix, AZ, USA
Ye Shen
Ye Shen
Baylor College of Medicine
Ping Ma
Ping Ma
University of Georgia
big data analyticsnonparametric modelingcomputational biologygeophysics
Hongtu Zhu
Hongtu Zhu
Kenan Distinguished Professor, The University of North Carolina at Chapel Hill
Medical Imaging Analysis, Statistical LearningMachine LearningAI for Two-sided Markets
Y
Yajun Yan
College of Engineering, University of Georgia, Athens, GA, USA
Dajiang Zhu
Dajiang Zhu
University of Texas at Arlington
Computer ScienceComputational NeuroscienceMedical Imaging
Tianming Liu
Tianming Liu
Distinguished Research Professor of Computer Science, University of Georgia
BrainBrain-Inspired AILLMArtificial General IntelligenceQuantum AI