🤖 AI Summary
To address the insufficient real-time performance, accuracy, and robustness of multi-sensor SLAM in large-scale environments, this paper proposes a tightly coupled multimodal SLAM system that fuses visual, IMU, learned/measured depth, LiDAR, and GNSS data to construct a globally consistent, dense voxel occupancy map. Methodologically, it employs a keyframe-based nonlinear optimization framework; introduces submap alignment factors to enable tight coupling between mapping and state estimation; supports online camera extrinsic calibration and both loose and tight GNSS integration; and designs an efficient submap management strategy for real-time operation. Evaluated on EuRoC, Hilti22, and VBR benchmarks, the system maintains real-time performance over 9 km sequences, achieves state-of-the-art localization accuracy, and generates navigation-ready maps—significantly improving mapping completeness and state estimation robustness in complex scenarios.
📝 Abstract
To empower mobile robots with usable maps as well as highest state estimation accuracy and robustness, we present OKVIS2-X: a state-of-the-art multi-sensor Simultaneous Localization and Mapping (SLAM) system building dense volumetric occupancy maps, while scalable to large environments and operating in realtime. Our unified SLAM framework seamlessly integrates different sensor modalities: visual, inertial, measured or learned depth, LiDAR and Global Navigation Satellite System (GNSS) measurements. Unlike most state-of-the-art SLAM systems, we advocate using dense volumetric map representations when leveraging depth or range-sensing capabilities. We employ an efficient submapping strategy that allows our system to scale to large environments, showcased in sequences of up to 9 kilometers. OKVIS2-X enhances its accuracy and robustness by tightly-coupling the estimator and submaps through map alignment factors. Our system provides globally consistent maps, directly usable for autonomous navigation. To further improve the accuracy of OKVIS2-X, we also incorporate the option of performing online calibration of camera extrinsics. Our system achieves the highest trajectory accuracy in EuRoC against state-of-the-art alternatives, outperforms all competitors in the Hilti22 VI-only benchmark, while also proving competitive in the LiDAR version, and showcases state of the art accuracy in the diverse and large-scale sequences from the VBR dataset.