š¤ AI Summary
This work addresses the degradation in rendering quality of 3D Gaussian splatting under sparse input views or view extrapolation, a challenge exacerbated by existing diffusion-based methods that struggle to balance generalization and reconstruction fidelity. The authors propose a fine-tuning-free 2Dā3D interleaved optimization framework that leverages pre-trained image diffusion models to guide the reconstruction of 3D Gaussians. A pixel-wise confidence mask is introduced to precisely identify and refine uncertain regions. This approach significantly enhances visual fidelity and multi-view consistency in extrapolated views, achieving performance on par with or superior to fine-tuning-based methods across multiple datasets while preserving strong generalization capabilities.
š Abstract
Neural Radiance Fields and 3D Gaussian Splatting have advanced novel view synthesis, yet still rely on dense inputs and often degrade at extrapolated views. Recent approaches leverage generative models, such as diffusion models, to provide additional supervision, but face a trade-off between generalization and fidelity: fine-tuning diffusion models for artifact removal improves fidelity but risks overfitting, while fine-tuning-free methods preserve generalization but often yield lower fidelity. We introduce FreeFix, a fine-tuning-free approach that pushes the boundary of this trade-off by enhancing extrapolated rendering with pretrained image diffusion models. We present an interleaved 2D-3D refinement strategy, showing that image diffusion models can be leveraged for consistent refinement without relying on costly video diffusion models. Furthermore, we take a closer look at the guidance signal for 2D refinement and propose a per-pixel confidence mask to identify uncertain regions for targeted improvement. Experiments across multiple datasets show that FreeFix improves multi-frame consistency and achieves performance comparable to or surpassing fine-tuning-based methods, while retaining strong generalization ability.