Have We Mastered Scale in Deep Monocular Visual SLAM? The ScaleMaster Dataset and Benchmark

📅 2026-02-20

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This work addresses the challenges of intra-session scale drift and inter-session scale ambiguity in deep monocular visual SLAM within large-scale indoor environments, which existing benchmarks fail to evaluate effectively. To this end, we propose ScaleMaster—the first large-scale indoor SLAM benchmark specifically designed for assessing scale consistency. It features multi-floor layouts, long trajectories, repeated viewpoints, and low-texture regions, complemented by high-fidelity 3D ground truth and map-level metrics such as Chamfer distance, thereby overcoming the limitations of conventional trajectory-only evaluation. Systematic benchmarking reveals that state-of-the-art methods commonly suffer from severe scale failures in realistic, complex settings, establishing ScaleMaster as a reliable foundation for future research in scale-consistent SLAM.

Technology Category

Application Category

📝 Abstract

Recent advances in deep monocular visual Simultaneous Localization and Mapping (SLAM) have achieved impressive accuracy and dense reconstruction capabilities, yet their robustness to scale inconsistency in large-scale indoor environments remains largely unexplored. Existing benchmarks are limited to room-scale or structurally simple settings, leaving critical issues of intra-session scale drift and inter-session scale ambiguity insufficiently addressed. To fill this gap, we introduce the ScaleMaster Dataset, the first benchmark explicitly designed to evaluate scale consistency under challenging scenarios such as multi-floor structures, long trajectories, repetitive views, and low-texture regions. We systematically analyze the vulnerability of state-of-the-art deep monocular visual SLAM systems to scale inconsistency, providing both quantitative and qualitative evaluations. Crucially, our analysis extends beyond traditional trajectory metrics to include a direct map-to-map quality assessment using metrics like Chamfer distance against high-fidelity 3D ground truth. Our results reveal that while recent deep monocular visual SLAM systems demonstrate strong performance on existing benchmarks, they suffer from severe scale-related failures in realistic, large-scale indoor environments. By releasing the ScaleMaster dataset and baseline results, we aim to establish a foundation for future research toward developing scale-consistent and reliable visual SLAM systems.

Problem

Research questions and friction points this paper is trying to address.

scale inconsistency

monocular visual SLAM

large-scale indoor environments

scale drift

scale ambiguity

Innovation

Methods, ideas, or system contributions that make the work stand out.

scale consistency

monocular visual SLAM

large-scale indoor environments