BevSplat: Resolving Height Ambiguity via Feature-Based Gaussian Primitives for Weakly-Supervised Cross-View Localization

📅 2025-02-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In weakly supervised cross-view localization, ground-level images and satellite elevation maps lack depth information, leading to altitude ambiguity and hindering accurate modeling of 3D spatial relationships. To address this, we propose a 3D Gaussian ellipsoid-based pixel representation: each ground-image pixel is modeled as a learnable Gaussian primitive encoding both semantic and spatial features, enabling synthesis of bird’s-eye-view (BEV) feature maps for relative pose regression. We further introduce an icosphere-based spherical projection supervision strategy—compatible with panoramic imagery and requiring neither planar assumptions nor complex cross-view Transformers. This work is the first to jointly optimize feature-level Gaussian primitive modeling and spherical geometric awareness under weak supervision. Our method achieves significant improvements over state-of-the-art approaches on both KITTI and VIGOR benchmarks, demonstrating consistent accuracy gains for both pinhole and panoramic ground-level images.

Technology Category

Application Category

📝 Abstract
This paper addresses the problem of weakly supervised cross-view localization, where the goal is to estimate the pose of a ground camera relative to a satellite image with noisy ground truth annotations. A common approach to bridge the cross-view domain gap for pose estimation is Bird's-Eye View (BEV) synthesis. However, existing methods struggle with height ambiguity due to the lack of depth information in ground images and satellite height maps. Previous solutions either assume a flat ground plane or rely on complex models, such as cross-view transformers. We propose BevSplat, a novel method that resolves height ambiguity by using feature-based Gaussian primitives. Each pixel in the ground image is represented by a 3D Gaussian with semantic and spatial features, which are synthesized into a BEV feature map for relative pose estimation. Additionally, to address challenges with panoramic query images, we introduce an icosphere-based supervision strategy for the Gaussian primitives. We validate our method on the widely used KITTI and VIGOR datasets, which include both pinhole and panoramic query images. Experimental results show that BevSplat significantly improves localization accuracy over prior approaches.
Problem

Research questions and friction points this paper is trying to address.

Resolves height ambiguity in cross-view localization.
Estimates ground camera pose relative to satellite images.
Improves localization accuracy for pinhole and panoramic images.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Feature-based Gaussian primitives resolve height ambiguity.
Icosphere-based supervision for panoramic query images.
BEV feature map synthesis for pose estimation.
🔎 Similar Papers
No similar papers found.