VDEGaussian: Video Diffusion Enhanced 4D Gaussian Splatting for Dynamic Urban Scenes Modeling

📅 2025-08-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address spatiotemporal inconsistency in dynamic urban scene modeling—caused by reconstruction distortion of fast-moving objects, temporal discontinuity, and insufficient sampling—this paper proposes a 4D Gaussian Splatting framework integrated with video diffusion priors. The method leverages test-time adaptive video diffusion models to extract temporally coherent motion priors, and introduces a joint timestamp optimization and uncertainty distillation mechanism to achieve precise pose alignment and faithful preservation of dynamic details. Experimental results demonstrate significant improvements in novel-view synthesis quality: PSNR increases by approximately 2 dB over baseline methods on standard benchmarks, with notable gains in geometric accuracy and visual fidelity for high-speed moving objects. This work establishes a new paradigm for efficient and robust 4D reconstruction of dynamic scenes.

Technology Category

Application Category

📝 Abstract
Dynamic urban scene modeling is a rapidly evolving area with broad applications. While current approaches leveraging neural radiance fields or Gaussian Splatting have achieved fine-grained reconstruction and high-fidelity novel view synthesis, they still face significant limitations. These often stem from a dependence on pre-calibrated object tracks or difficulties in accurately modeling fast-moving objects from undersampled capture, particularly due to challenges in handling temporal discontinuities. To overcome these issues, we propose a novel video diffusion-enhanced 4D Gaussian Splatting framework. Our key insight is to distill robust, temporally consistent priors from a test-time adapted video diffusion model. To ensure precise pose alignment and effective integration of this denoised content, we introduce two core innovations: a joint timestamp optimization strategy that refines interpolated frame poses, and an uncertainty distillation method that adaptively extracts target content while preserving well-reconstructed regions. Extensive experiments demonstrate that our method significantly enhances dynamic modeling, especially for fast-moving objects, achieving an approximate PSNR gain of 2 dB for novel view synthesis over baseline approaches.
Problem

Research questions and friction points this paper is trying to address.

Dynamic urban scene modeling with temporal consistency
Handling fast-moving objects in undersampled captures
Improving novel view synthesis for dynamic scenes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Video diffusion-enhanced 4D Gaussian Splatting framework
Joint timestamp optimization for pose alignment
Uncertainty distillation for adaptive content extraction
🔎 Similar Papers
No similar papers found.
Y
Yuru Xiao
Harbin Institute of Technology
Zihan Lin
Zihan Lin
Researcher, Xiaohongshu.
Recommender System
C
Chao Lu
Mach Drive
D
Deming Zhai
Harbin Institute of Technology
Kui Jiang
Kui Jiang
Harbin Institute of Technology
computer visionimage processingdeep learning
W
Wenbo Zhao
Harbin Institute of Technology
W
Wei Zhang
Harbin Institute of Technology
Junjun Jiang
Junjun Jiang
Harbin Institute of Technology
Image ProcessingComputer VisionMachine Learning
Huanran Wang
Huanran Wang
Mach Drive
X
Xianming Liu
Harbin Institute of Technology