🤖 AI Summary
Existing 3D assembly methods model the task solely as rigid pose estimation, struggling with missing parts and neglecting global shape semantics. This work introduces 3D generative modeling into the assembly problem for the first time, proposing a joint optimization framework that mutually reinforces structure-aware pose prediction and holistic shape generation. By simultaneously inferring part poses and completing missing geometry, the method integrates part-level priors with global shape context to enable cooperative reasoning between assembly and generation. Evaluated on complex real-world objects with missing components, the approach significantly outperforms existing pure pose estimation methods, achieving state-of-the-art performance.
📝 Abstract
Most existing 3D assembly methods treat the problem as pure pose estimation, rearranging observed parts via rigid transformations. In contrast, human assembly naturally couples structural reasoning with holistic shape inference. Inspired by this intuition, we reformulate 3D assembly as a joint problem of assembly and generation. We show that these two processes are mutually reinforcing: assembly provides part-level structural priors for generation, while generation injects holistic shape context that resolves ambiguities in assembly. Unlike prior methods that cannot synthesize missing geometry, we propose CRAG, which simultaneously generates plausible complete shapes and predicts poses for input parts. Extensive experiments demonstrate state-of-the-art performance across in-the-wild objects with diverse geometries, varying part counts, and missing pieces. Our code and models will be released.