GRLoc: Geometric Representation Regression for Visual Localization

📅 2025-11-17

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

In visual localization, Absolute Pose Regression (APR) models often suffer from limited generalization due to end-to-end black-box learning that lacks explicit 3D geometric understanding. To address this, we propose Geometric Representation Regression (GRR), a novel paradigm that abandons direct regression of 6-DoF poses. Instead, GRR separately regresses ray direction bundles (encoding rotation) and point maps (encoding translation), and integrates a differentiable deterministic geometric solver for end-to-end joint optimization in the world coordinate system. Crucially, GRR is the first method to leverage the inverse process of novel view synthesis for pose estimation—explicitly decoupling geometric representation learning from pose solving while embedding strong geometric priors. Evaluated on 7-Scenes and Cambridge Landmarks, GRR achieves state-of-the-art performance, delivering significant improvements in both absolute pose accuracy and cross-scene robustness.

Technology Category

Application Category

📝 Abstract

Absolute Pose Regression (APR) has emerged as a compelling paradigm for visual localization. However, APR models typically operate as black boxes, directly regressing a 6-DoF pose from a query image, which can lead to memorizing training views rather than understanding 3D scene geometry. In this work, we propose a geometrically-grounded alternative. Inspired by novel view synthesis, which renders images from intermediate geometric representations, we reformulate APR as its inverse that regresses the underlying 3D representations directly from the image, and we name this paradigm Geometric Representation Regression (GRR). Our model explicitly predicts two disentangled geometric representations in the world coordinate system: (1) a ray bundle's directions to estimate camera rotation, and (2) a corresponding pointmap to estimate camera translation. The final 6-DoF camera pose is then recovered from these geometric components using a differentiable deterministic solver. This disentangled approach, which separates the learned visual-to-geometry mapping from the final pose calculation, introduces a strong geometric prior into the network. We find that the explicit decoupling of rotation and translation predictions measurably boosts performance. We demonstrate state-of-the-art performance on 7-Scenes and Cambridge Landmarks datasets, validating that modeling the inverse rendering process is a more robust path toward generalizable absolute pose estimation.

Problem

Research questions and friction points this paper is trying to address.

Replaces black-box pose regression with geometric scene understanding

Predicts disentangled 3D representations for camera pose calculation

Improves visual localization by modeling inverse rendering process

Innovation

Methods, ideas, or system contributions that make the work stand out.

Regresses geometric representations from images

Predicts disentangled ray directions and pointmaps

Uses differentiable solver for pose calculation

🔎 Similar Papers

No similar papers found.