Have We Mastered Scale in Deep Monocular Visual SLAM? The ScaleMaster Dataset and Benchmark

๐Ÿ“… 2026-02-20
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenges of intra-session scale drift and inter-session scale ambiguity in deep monocular visual SLAM within large-scale indoor environments, which existing benchmarks fail to evaluate effectively. To this end, we propose ScaleMasterโ€”the first large-scale indoor SLAM benchmark specifically designed for assessing scale consistency. It features multi-floor layouts, long trajectories, repeated viewpoints, and low-texture regions, complemented by high-fidelity 3D ground truth and map-level metrics such as Chamfer distance, thereby overcoming the limitations of conventional trajectory-only evaluation. Systematic benchmarking reveals that state-of-the-art methods commonly suffer from severe scale failures in realistic, complex settings, establishing ScaleMaster as a reliable foundation for future research in scale-consistent SLAM.

Technology Category

Application Category

๐Ÿ“ Abstract
Recent advances in deep monocular visual Simultaneous Localization and Mapping (SLAM) have achieved impressive accuracy and dense reconstruction capabilities, yet their robustness to scale inconsistency in large-scale indoor environments remains largely unexplored. Existing benchmarks are limited to room-scale or structurally simple settings, leaving critical issues of intra-session scale drift and inter-session scale ambiguity insufficiently addressed. To fill this gap, we introduce the ScaleMaster Dataset, the first benchmark explicitly designed to evaluate scale consistency under challenging scenarios such as multi-floor structures, long trajectories, repetitive views, and low-texture regions. We systematically analyze the vulnerability of state-of-the-art deep monocular visual SLAM systems to scale inconsistency, providing both quantitative and qualitative evaluations. Crucially, our analysis extends beyond traditional trajectory metrics to include a direct map-to-map quality assessment using metrics like Chamfer distance against high-fidelity 3D ground truth. Our results reveal that while recent deep monocular visual SLAM systems demonstrate strong performance on existing benchmarks, they suffer from severe scale-related failures in realistic, large-scale indoor environments. By releasing the ScaleMaster dataset and baseline results, we aim to establish a foundation for future research toward developing scale-consistent and reliable visual SLAM systems.
Problem

Research questions and friction points this paper is trying to address.

scale inconsistency
monocular visual SLAM
large-scale indoor environments
scale drift
scale ambiguity
Innovation

Methods, ideas, or system contributions that make the work stand out.

scale consistency
monocular visual SLAM
large-scale indoor environments
Chamfer distance
ScaleMaster dataset
๐Ÿ”Ž Similar Papers
No similar papers found.
H
Hyoseok Ju
Department of Robotics and Mechatronics Engineering, DGIST, Daegu, Republic of Korea
B
Bokeon Suh
Department of Robotics and Mechatronics Engineering, DGIST, Daegu, Republic of Korea
Giseop Kim
Giseop Kim
Assistant Professor, Dept of Robotics and Mechatronics Eng, DGIST
Mobile RoboticsField RoboticsSLAM