OKVIS2-X: Open Keyframe-based Visual-Inertial SLAM Configurable with Dense Depth or LiDAR, and GNSS

📅 2025-10-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the insufficient real-time performance, accuracy, and robustness of multi-sensor SLAM in large-scale environments, this paper proposes a tightly coupled multimodal SLAM system that fuses visual, IMU, learned/measured depth, LiDAR, and GNSS data to construct a globally consistent, dense voxel occupancy map. Methodologically, it employs a keyframe-based nonlinear optimization framework; introduces submap alignment factors to enable tight coupling between mapping and state estimation; supports online camera extrinsic calibration and both loose and tight GNSS integration; and designs an efficient submap management strategy for real-time operation. Evaluated on EuRoC, Hilti22, and VBR benchmarks, the system maintains real-time performance over 9 km sequences, achieves state-of-the-art localization accuracy, and generates navigation-ready maps—significantly improving mapping completeness and state estimation robustness in complex scenarios.

Technology Category

Application Category

📝 Abstract
To empower mobile robots with usable maps as well as highest state estimation accuracy and robustness, we present OKVIS2-X: a state-of-the-art multi-sensor Simultaneous Localization and Mapping (SLAM) system building dense volumetric occupancy maps, while scalable to large environments and operating in realtime. Our unified SLAM framework seamlessly integrates different sensor modalities: visual, inertial, measured or learned depth, LiDAR and Global Navigation Satellite System (GNSS) measurements. Unlike most state-of-the-art SLAM systems, we advocate using dense volumetric map representations when leveraging depth or range-sensing capabilities. We employ an efficient submapping strategy that allows our system to scale to large environments, showcased in sequences of up to 9 kilometers. OKVIS2-X enhances its accuracy and robustness by tightly-coupling the estimator and submaps through map alignment factors. Our system provides globally consistent maps, directly usable for autonomous navigation. To further improve the accuracy of OKVIS2-X, we also incorporate the option of performing online calibration of camera extrinsics. Our system achieves the highest trajectory accuracy in EuRoC against state-of-the-art alternatives, outperforms all competitors in the Hilti22 VI-only benchmark, while also proving competitive in the LiDAR version, and showcases state of the art accuracy in the diverse and large-scale sequences from the VBR dataset.
Problem

Research questions and friction points this paper is trying to address.

Develops a multi-sensor SLAM system for accurate robot state estimation
Integrates visual, inertial, depth, LiDAR and GNSS for robust mapping
Creates globally consistent dense volumetric maps for autonomous navigation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates visual, inertial, depth, LiDAR, and GNSS sensors
Uses dense volumetric submapping for scalable large environments
Enhances accuracy with tight map alignment and online calibration
🔎 Similar Papers
No similar papers found.