🤖 AI Summary
This work addresses the challenge of reconstructing multi-object 3D scenes from sparse RGB-D observations, where severe occlusion, partial visibility, and object symmetry pose significant difficulties. The authors propose RecGen, a novel framework that uniquely integrates generative modeling with compositional 3D shape priors to jointly optimize the shapes and poses of both objects and their constituent parts. Leveraging synthetic data-driven scene generation and strong 3D priors, RecGen achieves substantially superior performance compared to existing methods despite using nearly 80% less training data. Experimental results demonstrate that RecGen improves geometric fidelity, texture reconstruction, and pose accuracy by 30.1%, 9.1%, and 33.9%, respectively, establishing state-of-the-art performance in complex, heavily occluded scenes.
📝 Abstract
Accurately reconstructing complex full multi-object scenes from sparse observations remains a core challenge in computer vision and a key step toward scalable and reliable simulation for robotics. In this work, we introduce RecGen, a generative framework for probabilistic joint estimation of object and part shapes, as well as their pose under occlusion and partial visibility from one or multiple RGB-D images. By leveraging compositional synthetic scene generation and strong 3D shape priors, RecGen generalizes across diverse object types and real-world environments. RecGen achieves state-of-the-art performance on complex, heavily occluded datasets, robustly handling severe occlusions, symmetric objects, object parts, and intricate geometry and texture. Despite using nearly 80% fewer training meshes than the previous state of the art SAM3D, RecGen outperforms it by 30.1% in geometric shape quality, 9.1% in texture reconstruction, and 33.9% in pose estimation.