VEIGAR: View-consistent Explicit Inpainting and Geometry Alignment for 3D object Removal

📅 2025-06-13

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

To address the limitations of existing 3D object removal methods—namely, their reliance on initial 3D reconstruction, multi-view geometry priors, and inconsistent appearance modeling—this paper proposes the first reconstruction-free explicit inpainting framework. Methodologically: (1) it introduces an explicit prior alignment mechanism in pixel space to enforce cross-view geometric consistency; (2) it designs a scale-invariant depth loss to bypass scale and translation calibration inherent in monocular depth estimation; and (3) it integrates a lightweight foundation model with multi-view consistency supervision. Experiments demonstrate that our approach achieves state-of-the-art performance in both reconstruction accuracy and view consistency. Moreover, it trains three times faster than the current fastest method, significantly reducing computational overhead and deployment complexity.

Technology Category

Application Category

📝 Abstract

Recent advances in Novel View Synthesis (NVS) and 3D generation have significantly improved editing tasks, with a primary emphasis on maintaining cross-view consistency throughout the generative process. Contemporary methods typically address this challenge using a dual-strategy framework: performing consistent 2D inpainting across all views guided by embedded priors either explicitly in pixel space or implicitly in latent space; and conducting 3D reconstruction with additional consistency guidance. Previous strategies, in particular, often require an initial 3D reconstruction phase to establish geometric structure, introducing considerable computational overhead. Even with the added cost, the resulting reconstruction quality often remains suboptimal. In this paper, we present VEIGAR, a computationally efficient framework that outperforms existing methods without relying on an initial reconstruction phase. VEIGAR leverages a lightweight foundation model to reliably align priors explicitly in the pixel space. In addition, we introduce a novel supervision strategy based on scale-invariant depth loss, which removes the need for traditional scale-and-shift operations in monocular depth regularization. Through extensive experimentation, VEIGAR establishes a new state-of-the-art benchmark in reconstruction quality and cross-view consistency, while achieving a threefold reduction in training time compared to the fastest existing method, highlighting its superior balance of efficiency and effectiveness.

Problem

Research questions and friction points this paper is trying to address.

Achieving view-consistent 3D object removal without initial reconstruction

Aligning priors explicitly in pixel space efficiently

Improving depth regularization without scale-and-shift operations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight model aligns priors in pixel space

Scale-invariant depth loss removes traditional regularization

No initial 3D reconstruction phase reduces overhead

🔎 Similar Papers

No similar papers found.

World Labs

$250,000-$350,000 base salary (good-faith estimate for San Francisco Bay Area upon hire; actual offer based on experience, skills, and qualifications)

San Francisco / San Francisco Office, San Francisco, California, United States

Computer Vision / Machine Learning Engineer - 3D Reconstruction

Apple

Sunnyvale, United States of America

Authors to Follow