VEIGAR: View-consistent Explicit Inpainting and Geometry Alignment for 3D object Removal

📅 2025-06-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitations of existing 3D object removal methods—namely, their reliance on initial 3D reconstruction, multi-view geometry priors, and inconsistent appearance modeling—this paper proposes the first reconstruction-free explicit inpainting framework. Methodologically: (1) it introduces an explicit prior alignment mechanism in pixel space to enforce cross-view geometric consistency; (2) it designs a scale-invariant depth loss to bypass scale and translation calibration inherent in monocular depth estimation; and (3) it integrates a lightweight foundation model with multi-view consistency supervision. Experiments demonstrate that our approach achieves state-of-the-art performance in both reconstruction accuracy and view consistency. Moreover, it trains three times faster than the current fastest method, significantly reducing computational overhead and deployment complexity.

Technology Category

Application Category

📝 Abstract
Recent advances in Novel View Synthesis (NVS) and 3D generation have significantly improved editing tasks, with a primary emphasis on maintaining cross-view consistency throughout the generative process. Contemporary methods typically address this challenge using a dual-strategy framework: performing consistent 2D inpainting across all views guided by embedded priors either explicitly in pixel space or implicitly in latent space; and conducting 3D reconstruction with additional consistency guidance. Previous strategies, in particular, often require an initial 3D reconstruction phase to establish geometric structure, introducing considerable computational overhead. Even with the added cost, the resulting reconstruction quality often remains suboptimal. In this paper, we present VEIGAR, a computationally efficient framework that outperforms existing methods without relying on an initial reconstruction phase. VEIGAR leverages a lightweight foundation model to reliably align priors explicitly in the pixel space. In addition, we introduce a novel supervision strategy based on scale-invariant depth loss, which removes the need for traditional scale-and-shift operations in monocular depth regularization. Through extensive experimentation, VEIGAR establishes a new state-of-the-art benchmark in reconstruction quality and cross-view consistency, while achieving a threefold reduction in training time compared to the fastest existing method, highlighting its superior balance of efficiency and effectiveness.
Problem

Research questions and friction points this paper is trying to address.

Achieving view-consistent 3D object removal without initial reconstruction
Aligning priors explicitly in pixel space efficiently
Improving depth regularization without scale-and-shift operations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight model aligns priors in pixel space
Scale-invariant depth loss removes traditional regularization
No initial 3D reconstruction phase reduces overhead
🔎 Similar Papers
No similar papers found.
P
Pham Khai Nguyen Do
AITech Lab, Computer Science and Engineering Faculty, Ho Chi Minh City University of Technology, VNUHCM
B
Bao Nguyen Tran
AITech Lab, Computer Science and Engineering Faculty, Ho Chi Minh City University of Technology, VNUHCM
N
Nam Nguyen
AITech Lab, Computer Science and Engineering Faculty, Ho Chi Minh City University of Technology, VNUHCM
Duc Dung Nguyen
Duc Dung Nguyen
Ho Chi Minh City University of Technology (HCMUT)
Computer VisionSound ProcessingDeep LearningNLP