Towards Unified Latent Space for 3D Molecular Latent Diffusion Modeling

📅 2025-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D molecular generation methods employ separate latent spaces for atomic types, bond topology, and 3D coordinates, leading to inefficient training and difficulty in enforcing SE(3) equivariance. To address this, we propose the Unified Autoencoding Encoder-3D (UAE-3D), the first model that compresses multimodal molecular representations—atomic identity, connectivity, and Cartesian coordinates—into a single homogeneous latent sequence, thereby eliminating modality coupling while preserving SE(3) equivariance in 3D geometry. Building upon UAE-3D, we design a Diffusion Transformer framework that performs end-to-end latent-space diffusion without requiring explicit molecular priors. Evaluated on GEOM-Drugs and QM9, our method achieves state-of-the-art performance in both de novo and conditional generation: near-zero reconstruction error, significantly improved sampling efficiency, and superior generation quality across all metrics compared to prior approaches.

Technology Category

Application Category

📝 Abstract
3D molecule generation is crucial for drug discovery and material science, requiring models to process complex multi-modalities, including atom types, chemical bonds, and 3D coordinates. A key challenge is integrating these modalities of different shapes while maintaining SE(3) equivariance for 3D coordinates. To achieve this, existing approaches typically maintain separate latent spaces for invariant and equivariant modalities, reducing efficiency in both training and sampling. In this work, we propose extbf{U}nified Variational extbf{A}uto- extbf{E}ncoder for extbf{3D} Molecular Latent Diffusion Modeling ( extbf{UAE-3D}), a multi-modal VAE that compresses 3D molecules into latent sequences from a unified latent space, while maintaining near-zero reconstruction error. This unified latent space eliminates the complexities of handling multi-modality and equivariance when performing latent diffusion modeling. We demonstrate this by employing the Diffusion Transformer--a general-purpose diffusion model without any molecular inductive bias--for latent generation. Extensive experiments on GEOM-Drugs and QM9 datasets demonstrate that our method significantly establishes new benchmarks in both extit{de novo} and conditional 3D molecule generation, achieving leading efficiency and quality.
Problem

Research questions and friction points this paper is trying to address.

Integrate multi-modal data for 3D molecule generation.
Maintain SE(3) equivariance in unified latent space.
Improve efficiency and quality in molecular generation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified latent space for multi-modal 3D molecules
SE(3) equivariance maintained in latent sequences
Diffusion Transformer for efficient latent generation
🔎 Similar Papers
No similar papers found.
Yanchen Luo
Yanchen Luo
University of Science and Technology of China
AI4ScienceMulti-modal LLMs
Z
Zhiyuan Liu
National University of Singapore
Y
Yi Zhao
University of Science and Technology of China
S
Sihang Li
University of Science and Technology of China
Kenji Kawaguchi
Kenji Kawaguchi
Presidential Young Professor, National University of Singapore
LLMsLarge language modelDeep learningAI
T
Tat-Seng Chua
National University of Singapore
X
Xiang Wang
University of Science and Technology of China