Prot2Chat: Protein LLM with Early Fusion of Sequence and Structure

📅 2025-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Protein function understanding faces challenges including rigid classification paradigms, underutilization of 3D structural information, and the absence of systematic QA evaluation benchmarks. To address these, we propose the first multimodal large language model (MLLM) for protein understanding, unifying amino acid sequence and 3D structure modeling via early-fusion sequence–structure encoding and a protein–text cross-modal adapter, enabling natural language question answering. Our model integrates an enhanced ProteinMPNN encoder, a cross-attention adapter, and an LLaMA3 decoder, trained via LoRA fine-tuning with encoder freezing. On two benchmark datasets, it achieves statistically significant improvements over baselines in both automated metrics and expert evaluations. Notably, it demonstrates zero-shot conversational generalization—first in protein science—and exhibits strong robustness and interpretability on structure-aware QA tasks.

Technology Category

Application Category

📝 Abstract
Proteins play a pivotal role in living organisms, yet understanding their functions presents significant challenges, including the limited flexibility of classification-based methods, the inability to effectively leverage spatial structural information, and the lack of systematic evaluation metrics for protein Q&A systems. To address these limitations, we propose Prot2Chat, a novel framework that integrates multimodal protein representations with natural language through a unified module, enabling large language model (LLM)-driven answer generation. Our model incorporates a modified ProteinMPNN encoder, which encodes protein sequence and structural information in a unified manner, a protein-text adapter with cross-attention mechanisms, and a LLaMA3 decoder. To optimize training efficiency, we freeze the encoder and employ LoRA techniques for the decoder. We conducted experiments on two datasets, both automated metrics and expert evaluations demonstrate the superior performance of our model. Furthermore, zero-shot prediction results highlight its strong generalization capabilities. This framework offers a promising solution for bridging protein domain knowledge with natural language understanding, paving the way for transformative advancements in protein-related research.
Problem

Research questions and friction points this paper is trying to address.

Integrates protein sequence and structure
Enables LLM-driven protein Q&A
Improves protein function understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal protein representations integration
Modified ProteinMPNN encoder usage
LoRA techniques for decoder efficiency
🔎 Similar Papers
No similar papers found.
Z
Zhicong Wang
School of Computer Science and Technology, Soochow University, Suzhou, China
Zicheng Ma
Zicheng Ma
Peking University
BiophysicsBioinformaticsDeep learning
Ziqiang Cao
Ziqiang Cao
Soochow University
Natural Language Processing
C
Changlong Zhou
School of Computer Science and Technology, Soochow University, Suzhou, China
J
Jun Zhang
Changping Laboratory, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
Y
Yiqin Gao
Changping Laboratory, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China