🤖 AI Summary
To address low reconstruction fidelity and poor rendering efficiency in simulating dynamic urban environments for autonomous driving—particularly the inability of existing 3D Gaussian splatting methods to capture fine-grained inter-frame and cross-view variations—this paper proposes a Composite Gaussian Appearance Optimization (CGAO) method. CGAO establishes a three-level, multi-granularity appearance modeling framework comprising local Gaussians, dynamic objects, and global imagery, jointly optimizing multi-scale transformation parameters. It further introduces a cross-frame and cross-view feature alignment mechanism to ensure both global consistency and local detail fidelity. By synergistically designing 3D Gaussian point clouds with a refinement network, CGAO achieves state-of-the-art performance on Waymo, KITTI, NOTR, and VKITT2 benchmarks, delivering high-fidelity reconstruction while maintaining real-time rendering efficiency.
📝 Abstract
This work focuses on modeling dynamic urban environments for autonomous driving simulation. Contemporary data-driven methods using neural radiance fields have achieved photorealistic driving scene modeling, but they suffer from low rendering efficacy. Recently, some approaches have explored 3D Gaussian splatting for modeling dynamic urban scenes, enabling high-fidelity reconstruction and real-time rendering. However, these approaches often neglect to model fine-grained variations between frames and camera viewpoints, leading to suboptimal results. In this work, we propose a new approach named ArmGS that exploits composite driving Gaussian splatting with multi-granularity appearance refinement for autonomous driving scene modeling. The core idea of our approach is devising a multi-level appearance modeling scheme to optimize a set of transformation parameters for composite Gaussian refinement from multiple granularities, ranging from local Gaussian level to global image level and dynamic actor level. This not only models global scene appearance variations between frames and camera viewpoints, but also models local fine-grained changes of background and objects. Extensive experiments on multiple challenging autonomous driving datasets, namely, Waymo, KITTI, NOTR and VKITTI2, demonstrate the superiority of our approach over the state-of-the-art methods.