🤖 AI Summary
This paper addresses novel view synthesis (NVS) from single-frame, vehicle-mounted images in urban street scenes. We propose a two-stage method integrating 3D Gaussian splatting with a fine-tuned one-step diffusion model. In the first stage, Gaussian splatting is initialized and optimized using vehicle trajectory data to achieve efficient, geometrically consistent scene reconstruction. In the second stage, a lightweight, fine-tuned one-step diffusion model performs end-to-end image enhancement on rendered outputs, significantly improving texture fidelity and visual quality while preserving structural accuracy. The approach emphasizes practical data construction and model lightweighting to balance computational efficiency and reconstruction quality. Evaluated on the ICCV 2025 RealADSim-NVS Challenge, our method ranks second overall (composite score: 0.432), achieving state-of-the-art performance in PSNR, SSIM, and LPIPS metrics.
📝 Abstract
This paper describes the Qualcomm AI Research solution to the RealADSim-NVS challenge, hosted at the RealADSim Workshop at ICCV 2025. The challenge concerns novel view synthesis in street scenes, and participants are required to generate, starting from car-centric frames captured during some training traversals, renders of the same urban environment as viewed from a different traversal (e.g. different street lane or car direction). Our solution is inspired by hybrid methods in scene generation and generative simulators merging gaussian splatting and diffusion models, and it is composed of two stages: First, we fit a 3D reconstruction of the scene and render novel views as seen from the target cameras. Then, we enhance the resulting frames with a dedicated single-step diffusion model. We discuss specific choices made in the initialization of gaussian primitives as well as the finetuning of the enhancer model and its training data curation. We report the performance of our model design and we ablate its components in terms of novel view quality as measured by PSNR, SSIM and LPIPS. On the public leaderboard reporting test results, our proposal reaches an aggregated score of 0.432, achieving the second place overall.