🤖 AI Summary
To address the drift susceptibility and poor initialization robustness of visual-inertial odometry (VIO) in context-aware applications such as augmented reality (AR), this paper proposes NeRF-VIO: a visual-inertial odometry framework incorporating neural radiance field (NeRF) prior maps. Methodologically, we introduce, for the first time, a geometrically consistent NeRF initialization loss formulated directly on the SE(3) manifold, and design a two-stage update mechanism integrated into the multi-state constraint Kalman filter (MSCKF) framework to enable real-time, high-accuracy state estimation under map constraints. Our approach synergistically combines NeRF rendering, MLP-based scene representation, and Lie-group geometric modeling. Experiments on real-world AR datasets demonstrate that NeRF-VIO significantly outperforms existing NeRF-initialized methods and standard MSCKF in both localization accuracy and computational efficiency, achieving state-of-the-art performance across all evaluated sequences.
📝 Abstract
A prior map serves as a foundational reference for localization in context-aware applications such as augmented reality (AR). Providing valuable contextual information about the environment, the prior map is a vital tool for mitigating drift. In this paper, we propose a map-based visual-inertial localization algorithm (NeRF-VIO) with initialization using neural radiance fields (NeRF). Our algorithm utilizes a multilayer perceptron model and redefines the loss function as the geodesic distance on (SE(3)), ensuring the invariance of the initialization model under a frame change within (mathfrak{se}(3)). The evaluation demonstrates that our model outperforms existing NeRF-based initialization solution in both accuracy and efficiency. By integrating a two-stage update mechanism within a multi-state constraint Kalman filter (MSCKF) framework, the state of NeRF-VIO is constrained by both captured images from an onboard camera and rendered images from a pre-trained NeRF model. The proposed algorithm is validated using a real-world AR dataset, the results indicate that our two-stage update pipeline outperforms MSCKF across all data sequences.