FaithFusion: Harmonizing Reconstruction and Generation via Pixel-wise Information Gain

๐Ÿ“… 2025-11-26
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address concurrent geometric distortion and photorealistic inconsistency in driving-scene reconstruction under large viewpoint variations, this paper proposes a plug-and-play fusion framework integrating 3D Gaussian Splatting (3DGS) with diffusion models. The core innovation is the first introduction of pixel-wise Expected Information Gain (EIG) as a spatial uncertainty prior, dynamically guiding the diffusion model to prioritize challenging regions for refinement. A pixel-wise weighted bidirectional feedback mechanism further refines the 3DGS geometry using generated outputsโ€”without architectural modifications or external priors. Evaluated on the Waymo dataset, our method achieves state-of-the-art performance: it outperforms prior works across NTA-IoU, NTL-IoU, and FID metrics; notably, under extreme 6-meter lane offset conditions, it attains an FID of 107.47, significantly mitigating geometric drift and over-correction artifacts.

Technology Category

Application Category

๐Ÿ“ Abstract
In controllable driving-scene reconstruction and 3D scene generation, maintaining geometric fidelity while synthesizing visually plausible appearance under large viewpoint shifts is crucial. However, effective fusion of geometry-based 3DGS and appearance-driven diffusion models faces inherent challenges, as the absence of pixel-wise, 3D-consistent editing criteria often leads to over-restoration and geometric drift. To address these issues, we introduce extbf{FaithFusion}, a 3DGS-diffusion fusion framework driven by pixel-wise Expected Information Gain (EIG). EIG acts as a unified policy for coherent spatio-temporal synthesis: it guides diffusion as a spatial prior to refine high-uncertainty regions, while its pixel-level weighting distills the edits back into 3DGS. The resulting plug-and-play system is free from extra prior conditions and structural modifications.Extensive experiments on the Waymo dataset demonstrate that our approach attains SOTA performance across NTA-IoU, NTL-IoU, and FID, maintaining an FID of 107.47 even at 6 meters lane shift. Our code is available at https://github.com/wangyuanbiubiubiu/FaithFusion.
Problem

Research questions and friction points this paper is trying to address.

Maintaining geometric fidelity while synthesizing plausible appearance under viewpoint shifts
Fusing geometry-based 3DGS with appearance-driven diffusion models effectively
Addressing over-restoration and geometric drift in 3D scene reconstruction and generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pixel-wise Expected Information Gain fusion framework
Guides diffusion as spatial prior for refinement
Distills edits into 3DGS via pixel weighting
๐Ÿ”Ž Similar Papers
No similar papers found.
Y
YuAn Wang
Baidu Inc.
Xiaofan Li
Xiaofan Li
East China Normal University
Computer Vision
C
Chi Huang
Baidu Inc.
W
Wenhao Zhang
Baidu Inc., Nanjing University
H
Hao Li
Baidu Inc.
B
Bosheng Wang
Baidu Inc.
Xun Sun
Xun Sun
Baidu Inc.
J
Jun Wang
Baidu Inc.