MoCA: Mixture-of-Components Attention for Scalable Compositional 3D Generation

📅 2025-12-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing partial-observation 3D generation methods rely on dense global attention, resulting in quadratic computational complexity with respect to the number of components—hindering scalable, fine-grained compositional 3D asset synthesis. To address this, we propose a scalable sparse global attention framework: first, components are ranked by importance; then, top-k routing selects salient components, while non-salient ones undergo semantic compression to preserve contextual priors with drastically reduced computation. Furthermore, we introduce a hybrid attention mechanism that jointly models local geometric details and global structural relationships. Experiments demonstrate that our method significantly outperforms existing baselines on compositional 3D object and scene generation, enabling high-fidelity, efficient synthesis for scenes comprising hundreds of components.

Technology Category

Application Category

📝 Abstract
Compositionality is critical for 3D object and scene generation, but existing part-aware 3D generation methods suffer from poor scalability due to quadratic global attention costs when increasing the number of components. In this work, we present MoCA, a compositional 3D generative model with two key designs: (1) importance-based component routing that selects top-k relevant components for sparse global attention, and (2) unimportant components compression that preserve contextual priors of unselected components while reducing computational complexity of global attention. With these designs, MoCA enables efficient, fine-grained compositional 3D asset creation with scalable number of components. Extensive experiments show MoCA outperforms baselines on both compositional object and scene generation tasks. Project page: https://lizhiqi49.github.io/MoCA
Problem

Research questions and friction points this paper is trying to address.

Addresses poor scalability in part-aware 3D generation due to quadratic attention costs
Introduces importance-based component routing and compression for efficient 3D asset creation
Enables fine-grained compositional 3D generation with a scalable number of components
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses importance-based component routing for sparse attention
Compresses unimportant components to reduce computational cost
Enables scalable compositional 3D generation with many components
Zhiqi Li
Zhiqi Li
PhD, Nanjing University
computer vision
W
Wenhuan Li
Tencent Hunyuan
T
Tengfei Wang
Tencent Hunyuan
Z
Zhenwei Wang
Tencent Hunyuan
J
Junta Wu
Tencent Hunyuan
Haoyuan Wang
Haoyuan Wang
University of Pennsylvania, Applied Mathematics and Computational Science
Biostatistics
Y
Yunhan Yang
Tencent Hunyuan
Zehuan Huang
Zehuan Huang
Beihang University
Generative ModelComputer Vision
Y
Yang Li
Tencent Hunyuan
C
Chunchao Guo
Tencent Hunyuan
Peidong Liu
Peidong Liu
Westlake University
3D computer visionRobotics