🤖 AI Summary
To address the prohibitive computational cost of 3D Gaussian Splatting (3DGS) in large-scale street-scene reconstruction—where complexity scales superlinearly with scene size—and its reliance on hard-to-obtain ground-truth 3D bounding boxes for static-dynamic object separation, this paper proposes a lightweight street-scene 3DGS reconstruction framework. Methodologically: (1) it eliminates 3D bounding boxes and introduces adaptive static-dynamic decoupling based solely on readily available 2D detection boxes; (2) it employs joint local-global transformation optimization to minimize redundant geometric transformations; and (3) it designs an efficient long-range rendering culling strategy. Evaluated on the Argoverse2 video dataset, our method achieves state-of-the-art PSNR and SSIM scores while reducing reconstruction time to 20–50% of mainstream approaches. This substantial acceleration significantly enhances practical deployability in real-world settings and improves scalability to large-scale urban environments.
📝 Abstract
Recently, 3D Gaussian Splatting (3DGS) has reshaped the field of photorealistic 3D reconstruction, achieving impressive rendering quality and speed. However, when applied to large-scale street scenes, existing methods suffer from rapidly escalating per-viewpoint reconstruction costs as scene size increases, leading to significant computational overhead. After revisiting the conventional pipeline, we identify three key factors accounting for this issue: unnecessary local-to-global transformations, excessive 3D-to-2D projections, and inefficient rendering of distant content. To address these challenges, we propose S3R-GS, a 3DGS framework that Streamlines the pipeline for large-scale Street Scene Reconstruction, effectively mitigating these limitations. Moreover, most existing street 3DGS methods rely on ground-truth 3D bounding boxes to separate dynamic and static components, but 3D bounding boxes are difficult to obtain, limiting real-world applicability. To address this, we propose an alternative solution with 2D boxes, which are easier to annotate or can be predicted by off-the-shelf vision foundation models. Such designs together make S3R-GS readily adapt to large, in-the-wild scenarios. Extensive experiments demonstrate that S3R-GS enhances rendering quality and significantly accelerates reconstruction. Remarkably, when applied to videos from the challenging Argoverse2 dataset, it achieves state-of-the-art PSNR and SSIM, reducing reconstruction time to below 50%--and even 20%--of competing methods.