🤖 AI Summary
Single-image 3D reconstruction under occlusion suffers from inconsistent novel-view synthesis and degraded geometric quality due to local missing structures. Method: We introduce the first occlusion-aware 3D reconstruction benchmark and propose a self-supervised multi-view diffusion framework that jointly learns occlusion completion and view-consistent modeling in an end-to-end manner. Our approach leverages diffusion models to synthesize six structurally coherent views while trained without ground-truth supervision on the Pix2Gestalt dataset; we further incorporate structure-aware pseudo-ground-truth modeling to enhance geometric fidelity. Contribution/Results: Experiments demonstrate significant improvements in novel-view synthesis consistency and 3D reconstruction accuracy across diverse real-world occlusion scenarios. The code is publicly available.
📝 Abstract
Reconstructing 3D objects from a single image is a long-standing challenge, especially under real-world occlusions. While recent diffusion-based view synthesis models can generate consistent novel views from a single RGB image, they generally assume fully visible inputs and fail when parts of the object are occluded. This leads to inconsistent views and degraded 3D reconstruction quality. To overcome this limitation, we propose an end-to-end framework for occlusion-aware multi-view generation. Our method directly synthesizes six structurally consistent novel views from a single partially occluded image, enabling downstream 3D reconstruction without requiring prior inpainting or manual annotations. We construct a self-supervised training pipeline using the Pix2Gestalt dataset, leveraging occluded-unoccluded image pairs and pseudo-ground-truth views to teach the model structure-aware completion and view consistency. Without modifying the original architecture, we fully fine-tune the view synthesis model to jointly learn completion and multi-view generation. Additionally, we introduce the first benchmark for occlusion-aware reconstruction, encompassing diverse occlusion levels, object categories, and mask patterns. This benchmark provides a standardized protocol for evaluating future methods under partial occlusions. Our code is available at https://github.com/Quyans/DeOcc123.