🤖 AI Summary
To address motion blur in dynamic objects and inaccurate geometric modeling of large-scale static backgrounds in autonomous driving scenarios, this paper proposes a dynamic-static decoupled neural Gaussian representation. The method introduces three key innovations: (1) a region-wise voxel initialization strategy to enhance geometric priors for large-scale scenes; (2) deformable Gaussian modeling jointly supervised by depth and surface normal estimates to optimize both dynamic object deformation and static background geometry; and (3) integration of geometric priors from pre-trained models to enable end-to-end joint optimization. Evaluated on Waymo and KITTI benchmarks, the approach significantly mitigates motion blur and geometric distortion, achieving state-of-the-art performance in both novel-view synthesis quality and 3D geometric accuracy.
📝 Abstract
In the realm of driving scenarios, the presence of rapidly moving vehicles, pedestrians in motion, and large-scale static backgrounds poses significant challenges for 3D scene reconstruction. Recent methods based on 3D Gaussian Splatting address the motion blur problem by decoupling dynamic and static components within the scene. However, these decoupling strategies overlook background optimization with adequate geometry relationships and rely solely on fitting each training view by adding Gaussians. Therefore, these models exhibit limited robustness in rendering novel views and lack an accurate geometric representation. To address the above issues, we introduce DriveSplat, a high-quality reconstruction method for driving scenarios based on neural Gaussian representations with dynamic-static decoupling. To better accommodate the predominantly linear motion patterns of driving viewpoints, a region-wise voxel initialization scheme is employed, which partitions the scene into near, middle, and far regions to enhance close-range detail representation. Deformable neural Gaussians are introduced to model non-rigid dynamic actors, whose parameters are temporally adjusted by a learnable deformation network. The entire framework is further supervised by depth and normal priors from pre-trained models, improving the accuracy of geometric structures. Our method has been rigorously evaluated on the Waymo and KITTI datasets, demonstrating state-of-the-art performance in novel-view synthesis for driving scenarios.