PLGSLAM: Progressive Neural Scene Represenation with Local to Global Bundle Adjustment

📅 2023-12-15
🏛️ Computer Vision and Pattern Recognition
📈 Citations: 60
Influential: 1
📄 PDF
🤖 AI Summary
Existing neural implicit SLAM methods suffer from poor reconstruction quality and severe pose drift in large-scale indoor scenes and long sequences, primarily due to limited global radiance field capacity and insufficient robustness of end-to-end pose networks. To address these issues, we propose a progressive local-to-global neural SLAM framework: (1) a sliding-window-based dynamic local modeling mechanism that jointly leverages tri-plane representations for high-frequency geometry and MLPs for low-frequency features; and (2) a local-global joint bundle adjustment scheme integrated with a global keyframe database to enable hierarchical optimization and real-time tracking. Evaluated on multi-scale indoor datasets, our method significantly outperforms state-of-the-art approaches—improving reconstruction fidelity (higher PSNR/SSIM), localization stability (37% reduction in absolute trajectory error), and runtime efficiency (>25 FPS). To the best of our knowledge, this is the first work to achieve high-fidelity surface reconstruction and robust, drift-resistant camera tracking in large-scale indoor environments.
📝 Abstract
Neural implicit scene representations have recently shown encouraging results in dense visual SLAM. However, existing methods produce low-quality scene reconstruction and low-accuracy localization performance when scaling up to large indoor scenes and long sequences. These limitations are mainly due to their single, global radiance field with finite capacity, which does not adapt to large scenarios. Their end-to-end pose networks are also not robust enough with the growth of cumulative errors in large scenes. To this end, we introduce PLGSLAM, a neural visual SLAM system capable of high-fidelity surface reconstruction and robust camera tracking in real-time. To handle large-scale indoor scenes, PLGSLAM proposes a progressive scene representation method which dynamically allocates new local scene representation trained with frames within a local sliding window. This allows us to scale up to larger indoor scenes and improves robustness (even under pose drifts). In local scene representation, PLGSLAM utilizes tri-planes for local high-frequency features with multilayer perceptron (MLP) networks for the low-frequency feature, achieving smoothness and scene completion in unobserved areas. Moreover, we propose local-to-global bundle adjustment method with a global keyframe database to address the increased pose drifts on long sequences. Experimental results demonstrate that PLGSLAM achieves state-of-the-art scene reconstruction results and tracking performance across various datasets and scenarios (both in small and large-scale indoor environments).
Problem

Research questions and friction points this paper is trying to address.

Low-quality reconstruction in large indoor scenes
Inaccurate localization in long sequences
Single global radiance field limits scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive scene representation with dynamic local allocation
Tri-planes and MLP for local feature optimization
Local-to-global bundle adjustment for pose drift reduction
🔎 Similar Papers
No similar papers found.
Tianchen Deng
Tianchen Deng
Shanghai Jiao Tong University
RoboticsComputer Vision
G
Guole Shen
Shanghai Jiao Tong University
Tong Qin
Tong Qin
Shanghai Jiao Tong University
RoboticsSLAMComputer Vision
J
Jianyu Wang
Shanghai Jiao Tong University
W
Wentao Zhao
Shanghai Jiao Tong University
J
Jingchuan Wang
Shanghai Jiao Tong University
Danwei Wang
Danwei Wang
Professor, Nanyang Technological University
RoboticsControl EngineeringFault Diagnosis
W
Weidong Chen
Shanghai Jiao Tong University