🤖 AI Summary
Zero-shot monocular 3D shape reconstruction suffers significant performance degradation in real-world scenarios due to inaccurate object segmentation and severe occlusion. To address this, we propose the first end-to-end occlusion-aware joint segmentation-reconstruction framework. Methodologically: (1) we unify segmentation and reconstruction into a single task and design a lightweight network for joint optimization; (2) we develop a scalable synthetic data pipeline that explicitly models diverse object-occluder-background configurations; (3) our model is trained exclusively on synthetic data—requiring no real 3D ground truth annotations or fine-tuning. Experiments demonstrate state-of-the-art zero-shot performance on real images, with substantially fewer parameters than existing methods. Moreover, our approach exhibits strong robustness to segmentation errors and heavy occlusion, enabling reliable reconstruction under challenging real-world conditions.
📝 Abstract
Recent monocular 3D shape reconstruction methods have shown promising zero-shot results on object-segmented images without any occlusions. However, their effectiveness is significantly compromised in real-world conditions, due to imperfect object segmentation by off-the-shelf models and the prevalence of occlusions. To effectively address these issues, we propose a unified regression model that integrates segmentation and reconstruction, specifically designed for occlusion-aware 3D shape reconstruction. To facilitate its reconstruction in the wild, we also introduce a scalable data synthesis pipeline that simulates a wide range of variations in objects, occluders, and backgrounds. Training on our synthetic data enables the proposed model to achieve state-of-the-art zero-shot results on real-world images, using significantly fewer parameters than competing approaches.