Hybrid Gaussian Splatting for Novel Urban View Synthesis

📅 2025-10-14

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

This paper addresses novel view synthesis (NVS) from single-frame, vehicle-mounted images in urban street scenes. We propose a two-stage method integrating 3D Gaussian splatting with a fine-tuned one-step diffusion model. In the first stage, Gaussian splatting is initialized and optimized using vehicle trajectory data to achieve efficient, geometrically consistent scene reconstruction. In the second stage, a lightweight, fine-tuned one-step diffusion model performs end-to-end image enhancement on rendered outputs, significantly improving texture fidelity and visual quality while preserving structural accuracy. The approach emphasizes practical data construction and model lightweighting to balance computational efficiency and reconstruction quality. Evaluated on the ICCV 2025 RealADSim-NVS Challenge, our method ranks second overall (composite score: 0.432), achieving state-of-the-art performance in PSNR, SSIM, and LPIPS metrics.

Technology Category

Application Category

📝 Abstract

This paper describes the Qualcomm AI Research solution to the RealADSim-NVS challenge, hosted at the RealADSim Workshop at ICCV 2025. The challenge concerns novel view synthesis in street scenes, and participants are required to generate, starting from car-centric frames captured during some training traversals, renders of the same urban environment as viewed from a different traversal (e.g. different street lane or car direction). Our solution is inspired by hybrid methods in scene generation and generative simulators merging gaussian splatting and diffusion models, and it is composed of two stages: First, we fit a 3D reconstruction of the scene and render novel views as seen from the target cameras. Then, we enhance the resulting frames with a dedicated single-step diffusion model. We discuss specific choices made in the initialization of gaussian primitives as well as the finetuning of the enhancer model and its training data curation. We report the performance of our model design and we ablate its components in terms of novel view quality as measured by PSNR, SSIM and LPIPS. On the public leaderboard reporting test results, our proposal reaches an aggregated score of 0.432, achieving the second place overall.

Problem

Research questions and friction points this paper is trying to address.

Synthesizing novel urban views from different street traversals

Enhancing 3D reconstructed scenes using Gaussian splatting and diffusion models

Improving novel view quality metrics like PSNR, SSIM, and LPIPS

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Gaussian splatting with diffusion model enhancement

Two-stage scene reconstruction and frame refinement

Optimized Gaussian primitive initialization and enhancer finetuning

🔎 Similar Papers

Generative Gaussian Splatting for Unbounded 3D City Generation