🤖 AI Summary
This work proposes a multi-granularity 3D part generation framework that explicitly models the human creative process—from abstract to detailed—to enable interactive 3D shape synthesis aligned with coarse-to-fine reasoning. The approach introduces an iterative splitting mechanism that learns the inverse of bounding box merging to generate structured part layouts. It comprises two stages: BoxSplitGen, which learns sequences of bounding box splits for controllable multi-granularity structure generation, and Box-to-Shape, a conditional diffusion model that translates these bounding boxes into fine-grained 3D geometries. Experiments demonstrate that BoxSplitGen outperforms conventional token-prediction and unconditional inpainting methods in structural quality, while the full system achieves superior controllability and detail fidelity compared to existing approaches.
📝 Abstract
Human creativity follows a perceptual process, moving from abstract ideas to finer details during creation. While 3D generative models have advanced dramatically, models specifically designed to assist human imagination in 3D creation -- particularly for detailing abstractions from coarse to fine -- have not been explored. We propose a framework that enables intuitive and interactive 3D shape generation by iteratively splitting bounding boxes to refine the set of bounding boxes. The main technical components of our framework are two generative models: the box-splitting generative model and the box-to-shape generative model. The first model, named BoxSplitGen, generates a collection of 3D part bounding boxes with varying granularity by iteratively splitting coarse bounding boxes. It utilizes part bounding boxes created through agglomerative merging and learns the reverse of the merging process -- the splitting sequences. The model consists of two main components: the first learns the categorical distribution of the box to be split, and the second learns the distribution of the two new boxes, given the set of boxes and the indication of which box to split. The second model, the box-to-shape generative model, is trained by leveraging the 3D shape priors learned by an existing 3D diffusion model while adapting the model to incorporate bounding box conditioning. In our experiments, we demonstrate that the box-splitting generative model outperforms token prediction models and the inpainting approach with an unconditional diffusion model. Also, we show that our box-to-shape model, based on a state-of-the-art 3D diffusion model, provides superior results compared to a previous model.