PPI2Text: Captioning Protein-Protein Interactions with Coordinate-Aligned Pair-Map Decoding

📅 2026-05-09
📈 Citations: 0
Influential: 0
📄 PDF

career value

167K/year
🤖 AI Summary
Existing methods struggle to generate free-text descriptions of protein–protein interactions (PPIs), hindering effective knowledge representation and integration. This work addresses this gap by formulating PPI modeling as a free-text generation task for the first time. The authors propose a coordinate-aligned PaCo-RoPE positional encoding scheme and introduce PPI2Text-Dataset, a large-scale dataset for this purpose. Their model leverages an ESM3 encoder and a Qwen3 decoder, augmented with a residue-pair mapping mechanism to enable end-to-end generation of interpretable interaction descriptions directly from amino acid sequences. Experimental results demonstrate that the proposed approach outperforms strong baselines in both linguistic quality and factual accuracy. Furthermore, evaluations by a large language model (LLM) judge confirm that the generated outputs exhibit high consistency with established biological evidence.
📝 Abstract
Protein-protein interaction (PPI) modeling has been widely studied as a binary or multi-label classification task. While emerging multimodal large language models (LLMs) can now describe single proteins, they remain unable to generate free-form descriptions of interactions between protein pairs. Moving beyond controlled vocabulary annotations, we propose to model PPI using free-text description, enabling richer expressiveness, improved interpretability, and better integration with literature knowledge base. We present PPI2Text, a multimodal LLM for free-form PPI captioning from amino acid sequences, that encodes each protein using ESM3 encoder, constructs a pair map from the two representations to capture interactions across all residue pairs, and autoregressively generates descriptions using a Qwen3 language decoder. We further introduce PaCo-RoPE, a coordinate-aligned positional encoding that aligns each axis of the pair grid with the residue positions of the corresponding protein. In addition, we release PPI2Text-Dataset, a 351k-pair corpus of free-form PPI descriptions aggregated from ten curated biological databases and further synthesized with Gemini under evidence-tiered prompting. PPI2Text consistently outperforms strong baselines across multiple ablation settings and evaluation protocols. It not only achieves higher scores on linguistic metrics against synthesized references, but also excels on factuality metrics, where an LLM-based judge evaluates outputs against raw biological evidence.
Problem

Research questions and friction points this paper is trying to address.

Protein-Protein Interaction
Free-text Description
Multimodal LLM
Interpretability
Knowledge Integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

PPI captioning
pair-map decoding
coordinate-aligned positional encoding
multimodal LLM
free-text interaction description