GSFeatLoc: Visual Localization Using Feature Correspondence on 3D Gaussian Splatting

📅 2025-04-29

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This work addresses visual localization under large initial pose uncertainty. We propose the first feature-level localization method leveraging 3D Gaussian Splatting (3DGS) scene representations. Given a coarse initial pose, our method renders RGB-D images from the 3DGS model, extracts and matches 2D features (e.g., SIFT or SuperPoint) between the query and synthesized images, lifts 2D correspondences to 2D–3D matches using depth, and finally refines the pose via Perspective-n-Point (PnP). Unlike conventional photometric-optimization-based approaches, ours is the first to employ 3DGS for feature-matching-driven localization, achieving substantial gains in efficiency and robustness. On a benchmark of over 2,700 test images, average inference time is merely 0.1 seconds (>100× speedup), with 90% of samples achieving rotation and translation errors below 5° and 0.05 units, respectively. The method tolerates initial pose errors up to 55° in rotation and 1.1 units in translation.

Technology Category

Application Category

📝 Abstract

In this paper, we present a method for localizing a query image with respect to a precomputed 3D Gaussian Splatting (3DGS) scene representation. First, the method uses 3DGS to render a synthetic RGBD image at some initial pose estimate. Second, it establishes 2D-2D correspondences between the query image and this synthetic image. Third, it uses the depth map to lift the 2D-2D correspondences to 2D-3D correspondences and solves a perspective-n-point (PnP) problem to produce a final pose estimate. Results from evaluation across three existing datasets with 38 scenes and over 2,700 test images show that our method significantly reduces both inference time (by over two orders of magnitude, from more than 10 seconds to as fast as 0.1 seconds) and estimation error compared to baseline methods that use photometric loss minimization. Results also show that our method tolerates large errors in the initial pose estimate of up to 55{deg} in rotation and 1.1 units in translation (normalized by scene scale), achieving final pose errors of less than 5{deg} in rotation and 0.05 units in translation on 90% of images from the Synthetic NeRF and Mip-NeRF360 datasets and on 42% of images from the more challenging Tanks and Temples dataset.

Problem

Research questions and friction points this paper is trying to address.

Localizing query images using 3D Gaussian Splatting representation

Reducing inference time and pose estimation errors significantly

Tolerating large initial pose errors for accurate localization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses 3D Gaussian Splatting for scene representation

Establishes 2D-2D correspondences with synthetic RGBD

Solves PnP problem for final pose estimation

🔎 Similar Papers

GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization