Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention

📅 2025-07-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing sparse voxel generation methods suffer from quadratic computational complexity due to global attention in two-stage diffusion, severely limiting efficiency. This paper introduces Ultra3D, an efficient and high-fidelity 3D generative framework. Its core contributions are: (1) a compact VecSet representation for sparse voxels; (2) a geometry-aware Part Attention mechanism that restricts attention computation to semantically consistent part regions, drastically reducing computational overhead; and (3) a scalable part-level annotation pipeline that improves structural continuity and reconstruction accuracy. Experiments demonstrate that Ultra3D achieves up to 6.7× acceleration in latent feature generation, enables high-quality 3D modeling at 1024 resolution, and attains state-of-the-art performance in both visual fidelity and user preference.

Technology Category

Application Category

📝 Abstract
Recent advances in sparse voxel representations have significantly improved the quality of 3D content generation, enabling high-resolution modeling with fine-grained geometry. However, existing frameworks suffer from severe computational inefficiencies due to the quadratic complexity of attention mechanisms in their two-stage diffusion pipelines. In this work, we propose Ultra3D, an efficient 3D generation framework that significantly accelerates sparse voxel modeling without compromising quality. Our method leverages the compact VecSet representation to efficiently generate a coarse object layout in the first stage, reducing token count and accelerating voxel coordinate prediction. To refine per-voxel latent features in the second stage, we introduce Part Attention, a geometry-aware localized attention mechanism that restricts attention computation within semantically consistent part regions. This design preserves structural continuity while avoiding unnecessary global attention, achieving up to 6.7x speed-up in latent generation. To support this mechanism, we construct a scalable part annotation pipeline that converts raw meshes into part-labeled sparse voxels. Extensive experiments demonstrate that Ultra3D supports high-resolution 3D generation at 1024 resolution and achieves state-of-the-art performance in both visual fidelity and user preference.
Problem

Research questions and friction points this paper is trying to address.

Reduces computational inefficiency in 3D generation frameworks
Improves sparse voxel modeling speed without quality loss
Enables high-resolution 3D generation with structural continuity
Innovation

Methods, ideas, or system contributions that make the work stand out.

VecSet representation for efficient coarse layout
Part Attention for localized feature refinement
Scalable part annotation pipeline for voxels
🔎 Similar Papers
2024-03-18European Conference on Computer VisionCitations: 70
Yiwen Chen
Yiwen Chen
Ph.D. student, S-Lab, Nanyang Technological University
Computer Vision3D GenerationGenerative Models
Z
Zhihao Li
Nanyang Technological University, Math Magic
Y
Yikai Wang
Tsinghua University
H
Hu Zhang
Math Magic
Q
Qin Li
Math Magic, School of Artificial Intelligence, Beijing Normal University
C
Chi Zhang
Westlake University
Guosheng Lin
Guosheng Lin
Nanyang Technological University
Computer VisionMachine Learning