GRLoc: Geometric Representation Regression for Visual Localization

📅 2025-11-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In visual localization, Absolute Pose Regression (APR) models often suffer from limited generalization due to end-to-end black-box learning that lacks explicit 3D geometric understanding. To address this, we propose Geometric Representation Regression (GRR), a novel paradigm that abandons direct regression of 6-DoF poses. Instead, GRR separately regresses ray direction bundles (encoding rotation) and point maps (encoding translation), and integrates a differentiable deterministic geometric solver for end-to-end joint optimization in the world coordinate system. Crucially, GRR is the first method to leverage the inverse process of novel view synthesis for pose estimation—explicitly decoupling geometric representation learning from pose solving while embedding strong geometric priors. Evaluated on 7-Scenes and Cambridge Landmarks, GRR achieves state-of-the-art performance, delivering significant improvements in both absolute pose accuracy and cross-scene robustness.

Technology Category

Application Category

📝 Abstract
Absolute Pose Regression (APR) has emerged as a compelling paradigm for visual localization. However, APR models typically operate as black boxes, directly regressing a 6-DoF pose from a query image, which can lead to memorizing training views rather than understanding 3D scene geometry. In this work, we propose a geometrically-grounded alternative. Inspired by novel view synthesis, which renders images from intermediate geometric representations, we reformulate APR as its inverse that regresses the underlying 3D representations directly from the image, and we name this paradigm Geometric Representation Regression (GRR). Our model explicitly predicts two disentangled geometric representations in the world coordinate system: (1) a ray bundle's directions to estimate camera rotation, and (2) a corresponding pointmap to estimate camera translation. The final 6-DoF camera pose is then recovered from these geometric components using a differentiable deterministic solver. This disentangled approach, which separates the learned visual-to-geometry mapping from the final pose calculation, introduces a strong geometric prior into the network. We find that the explicit decoupling of rotation and translation predictions measurably boosts performance. We demonstrate state-of-the-art performance on 7-Scenes and Cambridge Landmarks datasets, validating that modeling the inverse rendering process is a more robust path toward generalizable absolute pose estimation.
Problem

Research questions and friction points this paper is trying to address.

Replaces black-box pose regression with geometric scene understanding
Predicts disentangled 3D representations for camera pose calculation
Improves visual localization by modeling inverse rendering process
Innovation

Methods, ideas, or system contributions that make the work stand out.

Regresses geometric representations from images
Predicts disentangled ray directions and pointmaps
Uses differentiable solver for pose calculation
🔎 Similar Papers
No similar papers found.
C
Changyang Li
Goertek Alpha Labs
X
Xuejian Ma
Goertek Alpha Labs
L
Lixiang Liu
Goertek Alpha Labs
Zhan Li
Zhan Li
Goertek Alpha Labs
Qingan Yan
Qingan Yan
Goertek Alpha Labs
Computer VisionComputer GraphicsAIRoboticsMachine Learning
Y
Yi Xu
Goertek Alpha Labs