ObjFiller-3D: Consistent Multi-view 3D Inpainting via Video Diffusion Models

📅 2025-08-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multi-view 3D inpainting often suffers from inconsistent 2D image completions, leading to blurred textures, geometric distortions, and visual artifacts—severely limiting high-fidelity 3D object reconstruction. To address this, we introduce video diffusion models to 3D inpainting for the first time, proposing a cross-modal adaptation mechanism that bridges the semantic gap between 3D geometric representations and video temporal representations. We further design a reference-guided multi-view consistency optimization strategy to ensure spatial coherence and structural accuracy in 3D completion and editing. Evaluated on multiple benchmark datasets, our method achieves a PSNR of 26.6, significantly outperforming state-of-the-art approaches including NeRFiller and Instant3DIt. It delivers superior reconstruction quality while maintaining practical deployability.

Technology Category

Application Category

📝 Abstract
3D inpainting often relies on multi-view 2D image inpainting, where the inherent inconsistencies across different inpainted views can result in blurred textures, spatial discontinuities, and distracting visual artifacts. These inconsistencies pose significant challenges when striving for accurate and realistic 3D object completion, particularly in applications that demand high fidelity and structural coherence. To overcome these limitations, we propose ObjFiller-3D, a novel method designed for the completion and editing of high-quality and consistent 3D objects. Instead of employing a conventional 2D image inpainting model, our approach leverages a curated selection of state-of-the-art video editing model to fill in the masked regions of 3D objects. We analyze the representation gap between 3D and videos, and propose an adaptation of a video inpainting model for 3D scene inpainting. In addition, we introduce a reference-based 3D inpainting method to further enhance the quality of reconstruction. Experiments across diverse datasets show that compared to previous methods, ObjFiller-3D produces more faithful and fine-grained reconstructions (PSNR of 26.6 vs. NeRFiller (15.9) and LPIPS of 0.19 vs. Instant3dit (0.25)). Moreover, it demonstrates strong potential for practical deployment in real-world 3D editing applications. Project page: https://objfiller3d.github.io/ Code: https://github.com/objfiller3d/ObjFiller-3D .
Problem

Research questions and friction points this paper is trying to address.

Addresses multi-view inconsistency in 3D inpainting
Solves blurred textures and visual artifacts
Enhances structural coherence in 3D completion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Video diffusion models for 3D inpainting
Adaptation of video inpainting to 3D
Reference-based 3D reconstruction enhancement
🔎 Similar Papers
No similar papers found.
H
Haitang Feng
Nanjing University
J
Jie Liu
Nanjing University
Jie Tang
Jie Tang
UW Madison
Computed Tomography
G
Gangshan Wu
Nanjing University
B
Beiqi Chen
Harbin Institute of Technology
Jianhuang Lai
Jianhuang Lai
Sun Yat-sen University
Guangcong Wang
Guangcong Wang
Assistant Professor, Great Bay University
Machine LearningDeep Learning3D VisionAI4Science