AmodalGen3D: Generative Amodal 3D Object Reconstruction from Sparse Unposed Views

📅 2025-11-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address incomplete and geometrically inconsistent 3D object reconstruction from sparse, pose-free, and partially occluded multi-view inputs, this paper proposes a generative 3D reconstruction framework. Our method jointly models visible and invisible regions by introducing two novel attention mechanisms: view-wise cross-attention and stereo-conditioned cross-attention. These modules tightly integrate 2D amodal completion priors with multi-view stereo geometric constraints, enabling geometrically plausible and appearance-consistent inference of unobserved structures. Evaluated on both synthetic and real-world datasets, our approach significantly improves reconstruction completeness and fidelity over conventional multi-view reconstruction and single-view inpainting methods. It achieves state-of-the-art performance in recovering occluded geometry while preserving structural coherence and visual consistency across views. This robust solution advances object-level 3D understanding for applications including robotic grasping and AR/VR.

Technology Category

Application Category

📝 Abstract
Reconstructing 3D objects from a few unposed and partially occluded views is a common yet challenging problem in real-world scenarios, where many object surfaces are never directly observed. Traditional multi-view or inpainting-based approaches struggle under such conditions, often yielding incomplete or geometrically inconsistent reconstructions. We introduce AmodalGen3D, a generative framework for amodal 3D object reconstruction that infers complete, occlusion-free geometry and appearance from arbitrary sparse inputs. The model integrates 2D amodal completion priors with multi-view stereo geometry conditioning, supported by a View-Wise Cross Attention mechanism for sparse-view feature fusion and a Stereo-Conditioned Cross Attention module for unobserved structure inference. By jointly modeling visible and hidden regions, AmodalGen3D faithfully reconstructs 3D objects that are consistent with sparse-view constraints while plausibly hallucinating unseen parts. Experiments on both synthetic and real-world datasets demonstrate that AmodalGen3D achieves superior fidelity and completeness under occlusion-heavy sparse-view settings, addressing a pressing need for object-level 3D scene reconstruction in robotics, AR/VR, and embodied AI applications.
Problem

Research questions and friction points this paper is trying to address.

Reconstructs 3D objects from sparse, unposed, occluded views
Infers complete geometry and appearance from arbitrary sparse inputs
Addresses incomplete reconstructions in robotics, AR/VR, and AI applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative framework for amodal 3D object reconstruction
Integrates 2D amodal priors with multi-view stereo conditioning
Uses cross-attention mechanisms for sparse-view fusion and inference