Reconstruction Matters: Learning Geometry-Aligned BEV Representation through 3D Gaussian Splatting

📅 2026-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing bird’s-eye-view (BEV) perception methods, which suffer from constrained geometric accuracy and semantic consistency due to the absence of explicit 3D geometric modeling. To overcome this, the paper introduces 3D Gaussian Splatting reconstruction into the BEV perception framework for the first time, leveraging multi-view images to explicitly construct a high-fidelity 3D scene representation and generate geometrically aligned BEV features. By effectively integrating explicit 3D geometry with semantic information, the proposed approach significantly enhances model interpretability and perception performance. State-of-the-art results on the nuScenes and Argoverse benchmarks demonstrate the efficacy and potential of explicit 3D reconstruction in advancing BEV-based perception systems.

Technology Category

Application Category

📝 Abstract
Bird's-Eye-View (BEV) perception serves as a cornerstone for autonomous driving, offering a unified spatial representation that fuses surrounding-view images to enable reasoning for various downstream tasks, such as semantic segmentation, 3D object detection, and motion prediction. However, most existing BEV perception frameworks adopt an end-to-end training paradigm, where image features are directly transformed into the BEV space and optimized solely through downstream task supervision. This formulation treats the entire perception process as a black box, often lacking explicit 3D geometric understanding and interpretability, leading to suboptimal performance. In this paper, we claim that an explicit 3D representation matters for accurate BEV perception, and we propose Splat2BEV, a Gaussian Splatting-assisted framework for BEV tasks. Splat2BEV aims to learn BEV feature representations that are both semantically rich and geometrically precise. We first pre-train a Gaussian generator that explicitly reconstructs 3D scenes from multi-view inputs, enabling the generation of geometry-aligned feature representations. These representations are then projected into the BEV space to serve as inputs for downstream tasks. Extensive experiments on nuScenes and argoverse dataset demonstrate that Splat2BEV achieves state-of-the-art performance and validate the effectiveness of incorporating explicit 3D reconstruction into BEV perception.
Problem

Research questions and friction points this paper is trying to address.

BEV perception
3D geometric understanding
explicit 3D representation
autonomous driving
interpretability
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian Splatting
Bird's-Eye-View (BEV) perception
geometry-aligned representation
explicit 3D reconstruction
autonomous driving
🔎 Similar Papers
No similar papers found.