Chem3DLLM: 3D Multimodal Large Language Models for Chemistry

📅 2025-08-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing autoregressive language models struggle to generate 3D molecular conformations due to three key limitations: (1) incompatibility between continuous 3D geometric structures and discrete token spaces; (2) difficulty in unifying heterogeneous modalities—such as proteins, ligands, and text—within a single architecture; and (3) absence of physicochemical priors, hindering effective structural constraint enforcement. To address these challenges, we introduce the first protein-conditioned multimodal large language model. Our method features a novel invertible 3D molecular-text encoder—incorporating run-length compression for lossless 3× geometric compression—alongside a protein embedding projector and a stability-driven reinforcement learning optimization framework. The model jointly generates protein binding pockets and ligand conformations, achieving a state-of-the-art −7.21 Vina score on structure-based drug design tasks. This result validates both its scientific soundness and practical efficacy in real-world drug discovery.

Technology Category

Application Category

📝 Abstract
In the real world, a molecule is a 3D geometric structure. Compared to 1D SMILES sequences and 2D molecular graphs, 3D molecules represent the most informative molecular modality. Despite the rapid progress of autoregressive-based language models, they cannot handle the generation of 3D molecular conformation due to several challenges: 1) 3D molecular structures are incompatible with LLMs' discrete token space, 2) integrating heterogeneous inputs like proteins, ligands, and text remains difficult within a unified model, and 3) LLMs lack essential scientific priors, hindering the enforcement of physical and chemical constraints during generation. To tackle these issues, we present Chem3DLLM, a unified protein-conditioned multimodal large language model. Our approach designs a novel reversible text encoding for 3D molecular structures using run-length compression, achieving 3x size reduction while preserving complete structural information. This enables seamless integration of molecular geometry with protein pocket features in a single LLM architecture. We employ reinforcement learning with stability-based rewards to optimize chemical validity and incorporate a lightweight protein embedding projector for end-to-end training. Experimental results on structure-based drug design demonstrate state-of-the-art performance with a Vina score of -7.21, validating our unified multimodal approach for practical drug discovery applications.
Problem

Research questions and friction points this paper is trying to address.

Handling 3D molecular generation incompatible with LLMs' token space
Integrating heterogeneous inputs like proteins, ligands, and text
Enforcing physical and chemical constraints during molecular generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reversible text encoding for 3D molecules
Reinforcement learning with stability rewards
Lightweight protein embedding projector
🔎 Similar Papers
No similar papers found.