Elucidating the Design Space of Multimodal Protein Language Models

📅 2025-04-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multimodal protein language models (PLMs) suffer from fine-grained geometric information loss and inaccurate structure prediction due to discretizing 3D atomic coordinates into tokenized representations. To address this, we present the first systematic characterization of the design space for geometry-aware PLMs and propose three key innovations: (1) replacing discrete tokenization with continuous-space structural supervision; (2) designing a geometrically aware, structure-sensitive Transformer architecture; and (3) introducing a fine-grained generative representation learning objective. Evaluated on the PDB test set, our 650M-parameter model achieves a substantial reduction in RMSD—from 5.52 Å to 2.36 Å—outperforming a 3B-parameter baseline in folding accuracy while enhancing structural diversity. Its performance rivals that of specialized protein folding models, demonstrating that explicitly incorporating geometric continuity and structural awareness into PLM design significantly advances end-to-end protein structure prediction.

Technology Category

Application Category

📝 Abstract
Multimodal protein language models (PLMs) integrate sequence and token-based structural information, serving as a powerful foundation for protein modeling, generation, and design. However, the reliance on tokenizing 3D structures into discrete tokens causes substantial loss of fidelity about fine-grained structural details and correlations. In this paper, we systematically elucidate the design space of multimodal PLMs to overcome their limitations. We identify tokenization loss and inaccurate structure token predictions by the PLMs as major bottlenecks. To address these, our proposed design space covers improved generative modeling, structure-aware architectures and representation learning, and data exploration. Our advancements approach finer-grained supervision, demonstrating that token-based multimodal PLMs can achieve robust structural modeling. The effective design methods dramatically improve the structure generation diversity, and notably, folding abilities of our 650M model by reducing the RMSD from 5.52 to 2.36 on PDB testset, even outperforming 3B baselines and on par with the specialized folding models.
Problem

Research questions and friction points this paper is trying to address.

Overcoming fidelity loss in tokenized 3D protein structures
Addressing inaccurate structure token predictions in PLMs
Improving protein structure generation diversity and accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Improved generative modeling for protein structures
Structure-aware architectures enhance representation learning
Data exploration reduces tokenization loss significantly
🔎 Similar Papers
No similar papers found.
C
Cheng-Yen Hsieh
ByteDance Research
X
Xinyou Wang
School of Computer Science, Nanjing University
D
Daiheng Zhang
Dept. of ECE, Rutgers University
D
Dongyu Xue
ByteDance Research
F
Fei Ye
ByteDance Research
Shujian Huang
Shujian Huang
School of Computer Science, Nanjing University
Natural Language ProcessingMachine TranslationMultilingualismLarge Language Models
Zaixiang Zheng
Zaixiang Zheng
ByteDance Seed
MLNLPAI for Science
Quanquan Gu
Quanquan Gu
Associate Professor of Computer Science, UCLA
AGILarge Language ModelsReinforcement LearningNonconvex Optimization