High-Fidelity Novel View Synthesis via Splatting-Guided Diffusion

📅 2025-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the co-occurrence of geometric distortion and texture hallucination in novel view synthesis (NVS) from a single image or sparse views, this paper proposes SplatDiff—the first framework that synergistically integrates pixel-level splatting guidance with video diffusion modeling. Methodologically, we design a splatting-aligned synthesis strategy to ensure geometric consistency, introduce a texture bridging module to suppress texture hallucination during diffusion, and employ adaptive feature fusion for cross-view representation alignment. SplatDiff operates in a zero-shot manner without fine-tuning, supporting single-image NVS, sparse-view synthesis, and stereo video generation. On single-image NVS, it achieves state-of-the-art performance, significantly improving both geometric fidelity and texture detail quality.

Technology Category

Application Category

📝 Abstract
Despite recent advances in Novel View Synthesis (NVS), generating high-fidelity views from single or sparse observations remains a significant challenge. Existing splatting-based approaches often produce distorted geometry due to splatting errors. While diffusion-based methods leverage rich 3D priors to achieve improved geometry, they often suffer from texture hallucination. In this paper, we introduce SplatDiff, a pixel-splatting-guided video diffusion model designed to synthesize high-fidelity novel views from a single image. Specifically, we propose an aligned synthesis strategy for precise control of target viewpoints and geometry-consistent view synthesis. To mitigate texture hallucination, we design a texture bridge module that enables high-fidelity texture generation through adaptive feature fusion. In this manner, SplatDiff leverages the strengths of splatting and diffusion to generate novel views with consistent geometry and high-fidelity details. Extensive experiments verify the state-of-the-art performance of SplatDiff in single-view NVS. Additionally, without extra training, SplatDiff shows remarkable zero-shot performance across diverse tasks, including sparse-view NVS and stereo video conversion.
Problem

Research questions and friction points this paper is trying to address.

High-fidelity novel view synthesis
Mitigate texture hallucination
Geometry-consistent view synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

SplatDiff combines splatting and diffusion
Aligned synthesis ensures geometry consistency
Texture bridge module reduces hallucination
🔎 Similar Papers
No similar papers found.
X
Xiang Zhang
ETH Zürich, Switzerland and DisneyResearch|Studios, Switzerland
Y
Yang Zhang
DisneyResearch|Studios, Switzerland
Lukas Mehl
Lukas Mehl
PhD student, University of Stuttgart
Computer Vision
M
Markus Gross
ETH Zürich, Switzerland and DisneyResearch|Studios, Switzerland
Christopher Schroers
Christopher Schroers
Principal Research Scientist, Disney Research|Studios