Scalable Autoregressive 3D Molecule Generation

📅 2025-05-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the poor scalability and inferior generation quality of autoregressive models compared to diffusion models in 3D molecular generation, this paper introduces Quetzal—the first efficient and scalable autoregressive framework for 3D molecule generation. Its core innovation is atom serialization modeling: a causal Transformer predicts discrete atom types, while a lightweight continuous-coordinate diffusion MLP jointly models 3D atomic position distributions. The method natively supports variable-length generation tasks (e.g., hydrogen addition, scaffold completion) and enables exact likelihood computation and KL-divergence evaluation without architectural modification. Experiments demonstrate that Quetzal significantly outperforms existing autoregressive baselines and matches state-of-the-art diffusion models in generation quality—while achieving substantially faster inference. Quetzal thus establishes a highly competitive new pathway for autoregressive modeling in 3D molecular generation.

Technology Category

Application Category

📝 Abstract
Generative models of 3D molecular structure play a rapidly growing role in the design and simulation of molecules. Diffusion models currently dominate the space of 3D molecule generation, while autoregressive models have trailed behind. In this work, we present Quetzal, a simple but scalable autoregressive model that builds molecules atom-by-atom in 3D. Treating each molecule as an ordered sequence of atoms, Quetzal combines a causal transformer that predicts the next atom's discrete type with a smaller Diffusion MLP that models the continuous next-position distribution. Compared to existing autoregressive baselines, Quetzal achieves substantial improvements in generation quality and is competitive with the performance of state-of-the-art diffusion models. In addition, by reducing the number of expensive forward passes through a dense transformer, Quetzal enables significantly faster generation speed, as well as exact divergence-based likelihood computation. Finally, without any architectural changes, Quetzal natively handles variable-size tasks like hydrogen decoration and scaffold completion. We hope that our work motivates a perspective on scalability and generality for generative modelling of 3D molecules.
Problem

Research questions and friction points this paper is trying to address.

Improving autoregressive 3D molecule generation quality
Enhancing generation speed and likelihood computation
Handling variable-size molecular tasks natively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Autoregressive 3D molecule generation atom-by-atom
Combines causal transformer and Diffusion MLP
Enables faster generation and exact likelihood computation
🔎 Similar Papers
No similar papers found.
A
Austin H. Cheng
Department of Chemistry, University of Toronto, Toronto, ON, Canada; Department of Computer Science, University of Toronto, Toronto, ON, Canada; Vector Institute for Artificial Intelligence, Toronto, ON, Canada
Chong Sun
Chong Sun
Tencent WeChat
Computer Vision
A
Al'an Aspuru-Guzik
Department of Chemistry, University of Toronto, Toronto, ON, Canada; Department of Computer Science, University of Toronto, Toronto, ON, Canada; Vector Institute for Artificial Intelligence, Toronto, ON, Canada; Department of Materials Science & Engineering, University of Toronto, Toronto, ON, Canada; Department of Chemical Engineering & Applied Chemistry, University of Toronto, Toronto, ON, Canada; Senior Fellow, Canadian Institute for Advanced Research (CIFAR), Toronto, ON, Canada; Acceleration Consortium