Visible Structure Retrieval for Lightweight Image-Based Relocalisation

📅 2025-11-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
For image-based relocalization in large-scale scenes, existing methods rely on image retrieval or heuristic search, resulting in high computational complexity and substantial memory overhead. This paper introduces a novel paradigm—“visible structure retrieval”—which employs a lightweight neural network to directly predict, in an end-to-end manner, the set of 3D structural points visible from a given input image. By restricting conventional 2D–3D matching to the physically visible subspace, the method eliminates the need for explicit retrieval or hand-crafted heuristics. It jointly leverages structured map priors and a differentiable matching mechanism. Evaluated on multiple large-scale benchmarks, our approach achieves state-of-the-art localization accuracy while reducing matching search complexity to near-linear time and cutting memory consumption by approximately 60%. These improvements significantly enhance both relocalization efficiency and scalability.

Technology Category

Application Category

📝 Abstract
Accurate camera pose estimation from an image observation in a previously mapped environment is commonly done through structure-based methods: by finding correspondences between 2D keypoints on the image and 3D structure points in the map. In order to make this correspondence search tractable in large scenes, existing pipelines either rely on search heuristics, or perform image retrieval to reduce the search space by comparing the current image to a database of past observations. However, these approaches result in elaborate pipelines or storage requirements that grow with the number of past observations. In this work, we propose a new paradigm for making structure-based relocalisation tractable. Instead of relying on image retrieval or search heuristics, we learn a direct mapping from image observations to the visible scene structure in a compact neural network. Given a query image, a forward pass through our novel visible structure retrieval network allows obtaining the subset of 3D structure points in the map that the image views, thus reducing the search space of 2D-3D correspondences. We show that our proposed method enables performing localisation with an accuracy comparable to the state of the art, while requiring lower computational and storage footprint.
Problem

Research questions and friction points this paper is trying to address.

Achieving efficient camera pose estimation in large mapped environments
Reducing computational and storage requirements for image-based relocalization
Directly mapping image observations to visible 3D structure points
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural network directly maps images to visible 3D structure
Compact network retrieves visible scene structure subset
Reduces correspondence search space without image retrieval
🔎 Similar Papers
No similar papers found.
F
Fereidoon Zangeneh
1Division for Robotics, Perception and Learning (RPL), KTH Royal Institute of Technology, Stockholm, Sweden; 2Univrses AB, Stockholm, Sweden
Leonard Bruns
Leonard Bruns
Niantic Spatial
LocalizationMapping3D VisionDeep Learning
Amit Dekel
Amit Dekel
Nordita
A
Alessandro Pieropan
2Univrses AB, Stockholm, Sweden
Patric Jensfelt
Patric Jensfelt
KTH Royal Institute of Technology
robotics