PPI2Text: Captioning Protein-Protein Interactions with Coordinate-Aligned Pair-Map Decoding

📅 2026-05-09

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

Existing methods struggle to generate free-text descriptions of protein–protein interactions (PPIs), hindering effective knowledge representation and integration. This work addresses this gap by formulating PPI modeling as a free-text generation task for the first time. The authors propose a coordinate-aligned PaCo-RoPE positional encoding scheme and introduce PPI2Text-Dataset, a large-scale dataset for this purpose. Their model leverages an ESM3 encoder and a Qwen3 decoder, augmented with a residue-pair mapping mechanism to enable end-to-end generation of interpretable interaction descriptions directly from amino acid sequences. Experimental results demonstrate that the proposed approach outperforms strong baselines in both linguistic quality and factual accuracy. Furthermore, evaluations by a large language model (LLM) judge confirm that the generated outputs exhibit high consistency with established biological evidence.

📝 Abstract

Protein-protein interaction (PPI) modeling has been widely studied as a binary or multi-label classification task. While emerging multimodal large language models (LLMs) can now describe single proteins, they remain unable to generate free-form descriptions of interactions between protein pairs. Moving beyond controlled vocabulary annotations, we propose to model PPI using free-text description, enabling richer expressiveness, improved interpretability, and better integration with literature knowledge base. We present PPI2Text, a multimodal LLM for free-form PPI captioning from amino acid sequences, that encodes each protein using ESM3 encoder, constructs a pair map from the two representations to capture interactions across all residue pairs, and autoregressively generates descriptions using a Qwen3 language decoder. We further introduce PaCo-RoPE, a coordinate-aligned positional encoding that aligns each axis of the pair grid with the residue positions of the corresponding protein. In addition, we release PPI2Text-Dataset, a 351k-pair corpus of free-form PPI descriptions aggregated from ten curated biological databases and further synthesized with Gemini under evidence-tiered prompting. PPI2Text consistently outperforms strong baselines across multiple ablation settings and evaluation protocols. It not only achieves higher scores on linguistic metrics against synthesized references, but also excels on factuality metrics, where an LLM-based judge evaluates outputs against raw biological evidence.

Problem

Research questions and friction points this paper is trying to address.

Protein-Protein Interaction

Free-text Description

Multimodal LLM

Interpretability

Knowledge Integration

Innovation

Methods, ideas, or system contributions that make the work stand out.

PPI captioning

pair-map decoding

coordinate-aligned positional encoding