InstructBioMol: Advancing Biomolecule Understanding and Design Following Human Instructions

📅 2024-10-10
🏛️ arXiv.org
📈 Citations: 5
Influential: 0
📄 PDF

career value

203K/year
🤖 AI Summary
This study addresses the semantic gap between AI’s computational capabilities and human intent expression in biomolecular design, tackling challenges including natural language–protein/small-molecule semantic alignment, multimodal data integration, and high domain-expertise barriers. To this end, we introduce the first instruction-aligned large language model specifically designed for biomolecules, proposing an any-to-any cross-modal instruction alignment paradigm that enables bidirectional mapping among natural language, linear sequences (FASTA/SMILES), and 3D structural representations. Our architecture integrates graph neural networks, multimodal adapters, and instruction-tuning frameworks to support end-to-end generation of functional enzymes and drug molecules. Experimentally, the designed enzymes achieve an ESP Score of 70.4—surpassing the clinically significant threshold of 60.0 for the first time—while generated drug candidates exhibit a 10% improvement in binding affinity. This work establishes a foundational framework for intention-driven, multimodal biomolecular engineering.

Technology Category

Application Category

📝 Abstract
Understanding and designing biomolecules, such as proteins and small molecules, is central to advancing drug discovery, synthetic biology, and enzyme engineering. Recent breakthroughs in Artificial Intelligence (AI) have revolutionized biomolecular research, achieving remarkable accuracy in biomolecular prediction and design. However, a critical gap remains between AI's computational power and researchers' intuition, using natural language to align molecular complexity with human intentions. Large Language Models (LLMs) have shown potential to interpret human intentions, yet their application to biomolecular research remains nascent due to challenges including specialized knowledge requirements, multimodal data integration, and semantic alignment between natural language and biomolecules. To address these limitations, we present InstructBioMol, a novel LLM designed to bridge natural language and biomolecules through a comprehensive any-to-any alignment of natural language, molecules, and proteins. This model can integrate multimodal biomolecules as input, and enable researchers to articulate design goals in natural language, providing biomolecular outputs that meet precise biological needs. Experimental results demonstrate InstructBioMol can understand and design biomolecules following human instructions. Notably, it can generate drug molecules with a 10% improvement in binding affinity and design enzymes that achieve an ESP Score of 70.4, making it the only method to surpass the enzyme-substrate interaction threshold of 60.0 recommended by the ESP developer. This highlights its potential to transform real-world biomolecular research.
Problem

Research questions and friction points this paper is trying to address.

Bridging AI capabilities with human biomolecular design goals
Integrating natural language and multimodal biomolecular data
Improving drug and enzyme design accuracy via language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns natural language with biomolecules comprehensively
Integrates multimodal biomolecular data as input
Generates biomolecules following human instructions precisely
💼 Related Jobs
AI Data Engineer--LLMs / Agentic Systems
Pfizer
The annual base salary for this position ranges from $106,000.00 to $176,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 15.0% of the base salary and eligibility to participate in our share based long term incentive program. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
United States - Massachusetts - Cambridge
Z
Zhuang Xiang
College of Computer Science and Technology, Zhejiang University, Hangzhou, China; ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, China
K
Keyan Ding
ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, China
T
Tianwen Lyu
ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, China; Polytechnic Institute, Zhejiang University, Hangzhou, China
Y
Yinuo Jiang
College of Computer Science and Technology, Zhejiang University, Hangzhou, China; ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, China
Xiaotong Li
Xiaotong Li
Peking University
Multimodal LLMFoundation ModelTransfer Learning
Z
Zhuoyi Xiang
ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, China; Polytechnic Institute, Zhejiang University, Hangzhou, China
Zeyuan Wang
Zeyuan Wang
PhD, The University of Sydney
NLPMedical Informatics
M
Ming Qin
ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, China; School of Software Technology, Zhejiang University, Hangzhou, China
Kehua Feng
Kehua Feng
Ph.D. student, Zhejiang University
Natural Language ProcessingLanguage ModelAI for Science
J
Jike Wang
College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
Q
Qiang Zhang
The ZJU-UIUC Institute, International Campus, Zhejiang University, Haining, China; ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, China
H
Huajun Chen
College of Computer Science and Technology, Zhejiang University, Hangzhou, China; ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, China