GeLoc3r: Enhancing Relative Camera Pose Regression with Geometric Consistency Regularization

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

ReLoc3R achieves fast inference (25 ms) and state-of-the-art regression accuracy, but its internal geometric representation suffers from inconsistency, limiting its ability to match the accuracy ceiling of correspondence-based methods (e.g., MASt3R). To address this, we propose GeLoc3R, which introduces explicit geometric constraints into the regression framework via Geometric Consistency Regularization (GCR). GCR leverages ground-truth depth to generate 3D–2D correspondences, fuses them using a FusionTransformer to produce confidence-weighted matches, and estimates robust camera poses via weighted RANSAC—yielding a differentiable geometric supervision loss. Crucially, this regularization incurs no additional inference overhead, enabling the model to implicitly absorb geometric priors from matching-based approaches. Evaluated on CO3Dv2, RealEstate10K, and MegaDepth1500, GeLoc3R significantly outperforms ReLoc3R, achieving up to a 16% absolute gain in AUC@5° while approaching MASt3R’s accuracy—yet retaining real-time 25 ms inference speed.

Technology Category

Application Category

📝 Abstract

Prior ReLoc3R achieves breakthrough performance with fast 25ms inference and state-of-the-art regression accuracy, yet our analysis reveals subtle geometric inconsistencies in its internal representations that prevent reaching the precision ceiling of correspondence-based methods like MASt3R (which require 300ms per pair). In this work, we present GeLoc3r, a novel approach to relative camera pose estimation that enhances pose regression methods through Geometric Consistency Regularization (GCR). GeLoc3r overcomes the speed-accuracy dilemma by training regression networks to produce geometrically consistent poses without inference-time geometric computation. During training, GeLoc3r leverages ground-truth depth to generate dense 3D-2D correspondences, weights them using a FusionTransformer that learns correspondence importance, and computes geometrically-consistent poses via weighted RANSAC. This creates a consistency loss that transfers geometric knowledge into the regression network. Unlike FAR method which requires both regression and geometric solving at inference, GeLoc3r only uses the enhanced regression head at test time, maintaining ReLoc3R's fast speed and approaching MASt3R's high accuracy. On challenging benchmarks, GeLoc3r consistently outperforms ReLoc3R, achieving significant improvements including 40.45% vs. 34.85% AUC@5° on the CO3Dv2 dataset (16% relative improvement), 68.66% vs. 66.70% AUC@5° on RealEstate10K, and 50.45% vs. 49.60% on MegaDepth1500. By teaching geometric consistency during training rather than enforcing it at inference, GeLoc3r represents a paradigm shift in how neural networks learn camera geometry, achieving both the speed of regression and the geometric understanding of correspondence methods.

Problem

Research questions and friction points this paper is trying to address.

Enhancing camera pose regression with geometric consistency regularization

Overcoming speed-accuracy tradeoff in relative pose estimation

Teaching geometric consistency during training rather than inference

Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometric Consistency Regularization enhances pose regression

Training uses ground-truth depth and weighted RANSAC

Inference maintains fast speed without geometric computation

🔎 Similar Papers

GS-CPR: Efficient Camera Pose Refinement via 3D Gaussian Splatting