FaVoR: Features via Voxel Rendering for Camera Relocalization

📅 2024-09-11

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

To address the degradation of keypoint matching performance under large viewpoint or appearance changes in camera relocalization, this paper proposes a novel paradigm for robust image feature descriptor generation based on differentiable voxel rendering. Our method constructs a sparse yet locally dense 3D voxel map and synthesizes matchable descriptors for arbitrary viewpoints via voxel rendering. This work is the first to integrate voxel rendering into feature representation learning, unifying globally sparse storage with locally dense rendering while enabling cross-view descriptor synthesis—thereby relaxing the conventional requirement of viewpoint consistency in matching. Evaluated on the 7-Scenes and Cambridge Landmarks datasets, our approach reduces median translation error by 39% in indoor scenes, significantly outperforming state-of-the-art methods; it remains competitive in outdoor scenarios while incurring lower memory and computational overhead.

Technology Category

Application Category

📝 Abstract

Camera relocalization methods range from dense image alignment to direct camera pose regression from a query image. Among these, sparse feature matching stands out as an efficient, versatile, and generally lightweight approach with numerous applications. However, feature-based methods often struggle with significant viewpoint and appearance changes, leading to matching failures and inaccurate pose estimates. To overcome this limitation, we propose a novel approach that leverages a globally sparse yet locally dense 3D representation of 2D features. By tracking and triangulating landmarks over a sequence of frames, we construct a sparse voxel map optimized to render image patch descriptors observed during tracking. Given an initial pose estimate, we first synthesize descriptors from the voxels using volumetric rendering and then perform feature matching to estimate the camera pose. This methodology enables the generation of descriptors for unseen views, enhancing robustness to view changes. We extensively evaluate our method on the 7-Scenes and Cambridge Landmarks datasets. Our results show that our method significantly outperforms existing state-of-the-art feature representation techniques in indoor environments, achieving up to a 39% improvement in median translation error. Additionally, our approach yields comparable results to other methods for outdoor scenarios while maintaining lower memory and computational costs.

Problem

Research questions and friction points this paper is trying to address.

Camera Relocalization

Keypoint Matching

Indoor Environment

Innovation

Methods, ideas, or system contributions that make the work stand out.

FaVoR

3D representation

Indoor localization

🔎 Similar Papers

No similar papers found.