🤖 AI Summary
To address inaccurate depth estimation, unmodeled uncertainty, and resulting global geometric inconsistency in visual-inertial SLAM—hindering real-time robot planning—this paper proposes an uncertainty-aware tightly coupled VIO-SLAM framework. Methodologically, it introduces the first deep integration of motion stereo vision and depth neural networks; pixel-wise depth and its uncertainty—output by the network—are jointly propagated via reprojection and IMU preintegration to voxel occupancy probabilities and submap alignment factors, enabling globally consistent and scalable dense submap representation. The framework synergistically integrates deep depth estimation, probabilistic graph optimization, voxel-hashed occupancy mapping, and nonlinear least-squares optimization. Evaluated on EuRoC and TUM-VI benchmarks, it outperforms state-of-the-art methods in both localization and mapping accuracy, while enabling real-time generation of high-fidelity, confidence-aware voxel occupancy maps directly usable for downstream robotic planning and control.
📝 Abstract
We propose visual-inertial simultaneous localization and mapping that tightly couples sparse reprojection errors, inertial measurement unit pre-integrals, and relative pose factors with dense volumetric occupancy mapping. Hereby depth predictions from a deep neural network are fused in a fully probabilistic manner. Specifically, our method is rigorously uncertainty-aware: first, we use depth and uncertainty predictions from a deep network not only from the robot's stereo rig, but we further probabilistically fuse motion stereo that provides depth information across a range of baselines, therefore drastically increasing mapping accuracy. Next, predicted and fused depth uncertainty propagates not only into occupancy probabilities but also into alignment factors between generated dense submaps that enter the probabilistic nonlinear least squares estimator. This submap representation offers globally consistent geometry at scale. Our method is thoroughly evaluated in two benchmark datasets, resulting in localization and mapping accuracy that exceeds the state of the art, while simultaneously offering volumetric occupancy directly usable for downstream robotic planning and control in real-time.