Enhancing Scene Coordinate Regression with Efficient Keypoint Detection and Sequential Information

📅 2024-12-09
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Scene coordinate regression (SCR) suffers from degraded localization accuracy in repetitive-texture and textureless regions due to failure of implicit triangulation. To address this, we propose a unified sequential enhancement framework for SCR. Our key contributions are: (1) the first end-to-end architecture jointly encoding scene features and detecting salient keypoints, explicitly modeling scene saliency to suppress interference from non-informative regions; and (2) the first integration of temporal modeling—via LSTM or GRU—into both the coordinate mapping and relocalization stages of SCR, enforcing inter-frame geometric consistency. The method supports dual-mode inference: single-frame mode achieves a 6.4% recall gain at 90 Hz, while sequence mode yields an 11% recall improvement with unchanged computational efficiency. Extensive evaluation on multi-scale indoor and outdoor benchmarks demonstrates consistent superiority over state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
Scene Coordinate Regression (SCR) is a visual localization technique that utilizes deep neural networks (DNN) to directly regress 2D-3D correspondences for camera pose estimation. However, current SCR methods often face challenges in handling repetitive textures and meaningless areas due to their reliance on implicit triangulation. In this paper, we propose an efficient and accurate SCR system. Compared to existing SCR methods, we propose a unified architecture for both scene encoding and salient keypoint detection, allowing our system to prioritize the encoding of informative regions. This design significantly improves computational efficiency. Additionally, we introduce a mechanism that utilizes sequential information during both mapping and relocalization. The proposed method enhances the implicit triangulation, especially in environments with repetitive textures. Comprehensive experiments conducted across indoor and outdoor datasets demonstrate that the proposed system outperforms state-of-the-art (SOTA) SCR methods. Our single-frame relocalization mode improves the recall rate of our baseline by 6.4% and increases the running speed from 56Hz to 90Hz. Furthermore, our sequence-based mode increases the recall rate by 11% while maintaining the original efficiency.
Problem

Research questions and friction points this paper is trying to address.

Improving SCR accuracy in repetitive texture environments
Enhancing computational efficiency in scene coordinate regression
Utilizing sequential information for better mapping and relocalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified architecture for scene encoding and keypoint detection
Utilizes sequential information for mapping and relocalization
Enhances implicit triangulation in repetitive texture environments
🔎 Similar Papers
No similar papers found.
Kuan Xu
Kuan Xu
Nanyang Technological University
roboticsvisual SLAM
Zeyu Jiang
Zeyu Jiang
Nanyang Technological University
RoboticsComputer VisionDeep Learning
Haozhi Cao
Haozhi Cao
Nanyang Technological University
Multi-Modal SegmentationDomain AdaptationVideo Analysis
S
Shenghai Yuan
Centre for Advanced Robotics Technology Innovation (CARTIN), School of Electrical and Electronic Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798
C
Chen Wang
Spatial AI & Robotics Lab, Department of Computer Science and Engineering, University at Buffalo, Buffalo, NY 14260
Lihua Xie
Lihua Xie
Professor of Electrical Engineering, Nanyang Technological University
Robust controlNetworked ControlMult-agent Systems