Benchmarking Egocentric Visual-Inertial SLAM at City Scale

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Existing benchmarks inadequately capture key challenges in egocentric visual-inertial SLAM for wearable devices—namely, motion diversity, dynamic scenes, long-term drift, and absence of ground-truth pose. This work introduces the first city-scale, multimodal real-world trajectory dataset specifically designed for wearable egocentric SLAM. It pioneers the use of centimeter-accurate indirect control points as pose ground truth and defines multiple difficulty-level evaluation tracks. Methodologically, the dataset integrates high-precision geospatial calibration, tightly synchronized multi-sensor acquisition (RGB-D, IMU, GNSS, LiDAR), and robust long-sequence data processing. Experimental evaluation reveals substantial performance degradation of state-of-the-art SLAM systems under extreme conditions—including nighttime pedestrian navigation and vehicle-mounted mobility—highlighting critical robustness gaps. The dataset provides a reproducible benchmark and a systematic analytical framework to guide algorithmic improvements in real-world wearable SLAM.

Technology Category

Application Category

📝 Abstract

Precise 6-DoF simultaneous localization and mapping (SLAM) from onboard sensors is critical for wearable devices capturing egocentric data, which exhibits specific challenges, such as a wider diversity of motions and viewpoints, prevalent dynamic visual content, or long sessions affected by time-varying sensor calibration. While recent progress on SLAM has been swift, academic research is still driven by benchmarks that do not reflect these challenges or do not offer sufficiently accurate ground truth poses. In this paper, we introduce a new dataset and benchmark for visual-inertial SLAM with egocentric, multi-modal data. We record hours and kilometers of trajectories through a city center with glasses-like devices equipped with various sensors. We leverage surveying tools to obtain control points as indirect pose annotations that are metric, centimeter-accurate, and available at city scale. This makes it possible to evaluate extreme trajectories that involve walking at night or traveling in a vehicle. We show that state-of-the-art systems developed by academia are not robust to these challenges and we identify components that are responsible for this. In addition, we design tracks with different levels of difficulty to ease in-depth analysis and evaluation of less mature approaches. The dataset and benchmark are available at https://www.lamaria.ethz.ch.

Problem

Research questions and friction points this paper is trying to address.

Benchmarking visual-inertial SLAM for egocentric wearable devices at city scale

Addressing challenges like dynamic content and varying sensor calibration

Providing centimeter-accurate ground truth poses for robust trajectory evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Created egocentric visual-inertial SLAM dataset

Used surveying tools for centimeter-accurate annotations

Designed multi-difficulty tracks for robustness evaluation

🔎 Similar Papers

AirSLAM: An Efficient and Illumination-Robust Point-Line Visual SLAM System