Uni-3DAR: Unified 3D Generation and Understanding via Autoregression on Compressed Spatial Tokens

📅 2025-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
3D generation and understanding (3D GU) tasks have long been treated in isolation, and autoregressive modeling remains underexplored for sparse 3D structures. Method: We propose the first unified framework—octree-based two-level hierarchical tokenization—that encodes sparse 3D structures (e.g., molecules, proteins, polymers, crystals) into fine-grained spatial sequences of atomic types and coordinates; we further introduce subtree compression for up to 8× sequence reduction and design dynamic position-aware masking for autoregressive next-token prediction. Contribution/Results: Our approach is the first to formulate diverse 3D GU tasks end-to-end as language modeling problems. It consistently outperforms state-of-the-art diffusion models on both generation and understanding benchmarks, achieving an average performance gain of 256% and accelerating inference by 21.8×.

Technology Category

Application Category

📝 Abstract
Recent advancements in large language models and their multi-modal extensions have demonstrated the effectiveness of unifying generation and understanding through autoregressive next-token prediction. However, despite the critical role of 3D structural generation and understanding ({3D GU}) in AI for science, these tasks have largely evolved independently, with autoregressive methods remaining underexplored. To bridge this gap, we introduce Uni-3DAR, a unified framework that seamlessly integrates {3D GU} tasks via autoregressive prediction. At its core, Uni-3DAR employs a novel hierarchical tokenization that compresses 3D space using an octree, leveraging the inherent sparsity of 3D structures. It then applies an additional tokenization for fine-grained structural details, capturing key attributes such as atom types and precise spatial coordinates in microscopic 3D structures. We further propose two optimizations to enhance efficiency and effectiveness. The first is a two-level subtree compression strategy, which reduces the octree token sequence by up to 8x. The second is a masked next-token prediction mechanism tailored for dynamically varying token positions, significantly boosting model performance. By combining these strategies, Uni-3DAR successfully unifies diverse {3D GU} tasks within a single autoregressive framework. Extensive experiments across multiple microscopic {3D GU} tasks, including molecules, proteins, polymers, and crystals, validate its effectiveness and versatility. Notably, Uni-3DAR surpasses previous state-of-the-art diffusion models by a substantial margin, achieving up to 256% relative improvement while delivering inference speeds up to 21.8x faster. The code is publicly available at https://github.com/dptech-corp/Uni-3DAR.
Problem

Research questions and friction points this paper is trying to address.

Unifies 3D generation and understanding via autoregression.
Introduces hierarchical tokenization for 3D space compression.
Enhances efficiency with subtree compression and masked prediction.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified 3D generation and understanding via autoregression
Hierarchical tokenization using octree for 3D compression
Masked next-token prediction for dynamic token positions
🔎 Similar Papers
No similar papers found.
S
Shuqi Lu
DP Technology, Beijing, 100080, China.
Haowei Lin
Haowei Lin
Peking University
LLMAI4Science
L
Lin Yao
DP Technology, Beijing, 100080, China.
Zhifeng Gao
Zhifeng Gao
DP Technology
Data MiningMachine LearningAI for ScienceAI for Industry
X
Xiaohong Ji
DP Technology, Beijing, 100080, China.
E
E Weinan
AI for Science Institute, Beijing 100080, China.; School of Mathematical Sciences, Peking University, Beijing, 100871, China.; Center for Machine Learning Research, Peking University, Beijing, 100084, China.
Linfeng Zhang
Linfeng Zhang
DP Technology; AI for Science Institute
AI for Sciencemulti-scale modelingmolecular simulationdrug/materials design
Guolin Ke
Guolin Ke
DP Technology
Machine LearningAI for Science