EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis

📅 2025-03-26

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Existing NeRF and 3D Gaussian Splatting (3DGS) methods for urban novel-view synthesis suffer from scene-specific optimization, slow inference, multi-view inconsistency, and content duplication. To address these limitations, this paper proposes Volumetric Gaussian Splatting (VGS), a feed-forward volumetric framework. Its core contributions are: (1) a unified 3D convolutional network that directly predicts spatio-temporal Gaussian parameters across frames; (2) a joint optimization strategy integrating noise-aware depth-guided initialization, 3D geometric refinement, and 2D texture-driven shading; and (3) a deformable hemispherical sky background model. Evaluated on KITTI-360 and Waymo, VGS achieves state-of-the-art rendering quality among feed-forward 3DGS/NeRF approaches, enables real-time rendering (≥30 FPS), and significantly improves cross-scene generalization and geometry-appearance consistency.

Technology Category

Application Category

📝 Abstract

Novel view synthesis of urban scenes is essential for autonomous driving-related applications.Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization. We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner. Unlike existing feed-forward, pixel-aligned 3DGS methods, which often suffer from issues like multi-view inconsistencies and duplicated content, our approach predicts 3D Gaussians across multiple frames within a unified volume using a 3D convolutional network. This is achieved by initializing 3D Gaussians with noisy depth predictions, and then refining their geometric properties in 3D space and predicting color based on 2D textures. Our model also handles distant views and the sky with a flexible hemisphere background model. This enables us to perform fast, feed-forward reconstruction while achieving real-time rendering. Experimental evaluations on the KITTI-360 and Waymo datasets show that our method achieves state-of-the-art quality compared to existing feed-forward 3DGS- and NeRF-based methods.

Problem

Research questions and friction points this paper is trying to address.

Efficient urban scene synthesis for autonomous driving

Overcoming slow per-scene optimization in NeRF and 3DGS

Resolving multi-view inconsistencies in feed-forward 3DGS methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses 3D convolutional network for Gaussian prediction

Initializes Gaussians with noisy depth predictions

Employs hemisphere model for distant views

🔎 Similar Papers

Generative Gaussian Splatting for Unbounded 3D City Generation