🤖 AI Summary
Existing octree-based diffusion models treat 3D shapes as monolithic entities, ignoring semantic part–whole hierarchies, leading to poor generalization and high computational overhead. Method: We propose a multi-scale octree diffusion model featuring the first hierarchical part-to-whole message-passing mechanism for layered generation—from local parts to global structure. Specifically: (1) we design a part-aware cross-attention conditioning scheme to explicitly encode semantic hierarchy; (2) we construct the first octree-based 3D dataset with fine-grained part annotations; and (3) we integrate multi-scale feature interaction with cross-level message passing. Results: Experiments demonstrate significant improvements over state-of-the-art methods in both generation quality—especially for complex, sparse structures—and inference efficiency, validating the effectiveness and practicality of hierarchical sparse modeling.
📝 Abstract
3D content generation remains a fundamental yet challenging task due to the inherent structural complexity of 3D data. While recent octree-based diffusion models offer a promising balance between efficiency and quality through hierarchical generation, they often overlook two key insights: 1) existing methods typically model 3D objects as holistic entities, ignoring their semantic part hierarchies and limiting generalization; and 2) holistic high-resolution modeling is computationally expensive, whereas real-world objects are inherently sparse and hierarchical, making them well-suited for layered generation. Motivated by these observations, we propose HierOctFusion, a part-aware multi-scale octree diffusion model that enhances hierarchical feature interaction for generating fine-grained and sparse object structures. Furthermore, we introduce a cross-attention conditioning mechanism that injects part-level information into the generation process, enabling semantic features to propagate effectively across hierarchical levels from parts to the whole. Additionally, we construct a 3D dataset with part category annotations using a pre-trained segmentation model to facilitate training and evaluation. Experiments demonstrate that HierOctFusion achieves superior shape quality and efficiency compared to prior methods.