Prot2Text-V2: Protein Function Prediction with Multimodal Contrastive Alignment

📅 2025-05-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Protein function prediction has long been constrained by ontology dependency and the sequence similarity assumption, limiting flexibility and openness in generating natural language functional descriptions. To address this, we propose the first end-to-end protein sequence-to-function text generation framework. Our method encodes input sequences using ESM-3B and decodes functional descriptions via LLaMA-3.1-8B-Instruct, enhanced by a novel Hybrid Sequence-level Contrastive Alignment Learning (H-SCALE) strategy. H-SCALE achieves structure-agnostic alignment between protein embeddings and textual representations through joint mean–standard deviation pooling. A lightweight nonlinear modality projection bridges the protein language model (PLM) and large language model (LLM). Leveraging LoRA-based instruction tuning, our approach significantly outperforms traditional methods and LLM baselines under low-homology conditions, achieving state-of-the-art performance across BLEU, ROUGE, and BERTScore metrics.

Technology Category

Application Category

📝 Abstract
Predicting protein function from sequence is a central challenge in computational biology. While existing methods rely heavily on structured ontologies or similarity-based techniques, they often lack the flexibility to express structure-free functional descriptions and novel biological functions. In this work, we introduce Prot2Text-V2, a novel multimodal sequence-to-text model that generates free-form natural language descriptions of protein function directly from amino acid sequences. Our method combines a protein language model as a sequence encoder (ESM-3B) and a decoder-only language model (LLaMA-3.1-8B-Instruct) through a lightweight nonlinear modality projector. A key innovation is our Hybrid Sequence-level Contrastive Alignment Learning (H-SCALE), which improves cross-modal learning by matching mean- and std-pooled protein embeddings with text representations via contrastive loss. After the alignment phase, we apply instruction-based fine-tuning using LoRA on the decoder to teach the model how to generate accurate protein function descriptions conditioned on the protein sequence. We train Prot2Text-V2 on about 250K curated entries from SwissProt and evaluate it under low-homology conditions, where test sequences have low similarity with training samples. Prot2Text-V2 consistently outperforms traditional and LLM-based baselines across various metrics.
Problem

Research questions and friction points this paper is trying to address.

Predict protein function from sequence flexibly
Generate natural language descriptions of protein function
Improve cross-modal learning for protein-text alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal sequence-to-text model for protein function
Hybrid Sequence-level Contrastive Alignment Learning (H-SCALE)
Instruction-based fine-tuning using LoRA on decoder
🔎 Similar Papers
No similar papers found.