๐ค AI Summary
Existing 3D inpainting and object removal methods suffer from geometric distortions and texture inconsistencies under unconstrained viewpoints (i.e., arbitrary camera poses and trajectories). This paper introduces the first geometry-guided, multi-view test-time adaptive optimization framework. Our key contributions are: (1) object-mask-based fine-grained inpainting region detection to enhance robustness under complex viewpoints; (2) a geometry-prior-integrated 3D reconstruction pipeline coupled with a multi-view consistency refinement network; and (3) transfer adaptation of pre-trained image inpainting models, followed by test-time self-supervised fine-tuning. Evaluated on a newly constructed diverse unconstrained benchmark, our method significantly outperforms state-of-the-art approaches, achieving geometrically accurate, photorealistic, and cross-view consistent 3D inpaintingโboth in forward-facing and arbitrary-view settings.
๐ Abstract
Current 3D inpainting and object removal methods are largely limited to front-facing scenes, facing substantial challenges when applied to diverse,"unconstrained"scenes where the camera orientation and trajectory are unrestricted. To bridge this gap, we introduce a novel approach that produces inpainted 3D scenes with consistent visual quality and coherent underlying geometry across both front-facing and unconstrained scenes. Specifically, we propose a robust 3D inpainting pipeline that incorporates geometric priors and a multi-view refinement network trained via test-time adaptation, building on a pre-trained image inpainting model. Additionally, we develop a novel inpainting mask detection technique to derive targeted inpainting masks from object masks, boosting the performance in handling unconstrained scenes. To validate the efficacy of our approach, we create a challenging and diverse benchmark that spans a wide range of scenes. Comprehensive experiments demonstrate that our proposed method substantially outperforms existing state-of-the-art approaches.