MR.ScaleMaster: Scale-Consistent Collaborative Mapping from Crowd-Sourced Monocular Videos

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses scale collapse, long-term scale drift, and inter-device scale inconsistency in collaborative monocular mapping from crowd-sourced videos caused by erroneous loop closures. To this end, we propose MR.ScaleMaster, a novel system featuring a pioneering scale-collapse early-warning mechanism that intercepts all false-positive loop closures while preserving valid constraints. We extend the conventional SE(3) formulation to an explicit Sim(3) anchor model that jointly estimates per-session scale. Furthermore, MR.ScaleMaster provides a plug-and-play interface enabling seamless integration of diverse monocular SLAM backends—such as MASt3R-SLAM, pi3, and VGGT-SLAM 2.0—without requiring modifications to their internal pipelines. Evaluated on KITTI sequences with up to 15 agents, our approach reduces the absolute trajectory error (ATE) by 7.2× compared to SE(3)-based baselines and achieves, for the first time, unified dense mapping across heterogeneous monocular SLAM systems.

Technology Category

Application Category

📝 Abstract

Crowd-sourced cooperative mapping from monocular cameras promises scalable 3D reconstruction without specialized sensors, yet remains hindered by two scale-specific failure modes: abrupt scale collapse from false-positive loop closures in repetitive environments, and gradual scale drift over long trajectories and per-robot scale ambiguity that prevent direct multi-session fusion. We present MR.ScaleMaster, a cooperative mapping system for crowd-sourced monocular videos that addresses both failure modes. MR.ScaleMaster introduces three key mechanisms. First, a Scale Collapse Alarm rejects spurious loop closures before they corrupt the pose graph. Second, a Sim(3) anchor node formulation generalizes the classical SE(3) framework to explicitly estimate per-session scale, resolving per-robot scale ambiguity and enforcing global scale consistency. Third, a modular, open-source, plug-and-play interface enables any monocular reconstruction model to integrate without backend modification. On KITTI sequences with up to 15 agents, the Sim(3) formulation achieves a 7.2x ATE reduction over the SE(3) baseline, and the alarm rejects all false-positive loops while preserving every valid constraint. We further demonstrate heterogeneous multi-robot dense mapping fusing MASt3R-SLAM, pi3, and VGGT-SLAM 2.0 within a single unified map.

Problem

Research questions and friction points this paper is trying to address.

scale collapse

scale drift

monocular SLAM

multi-session fusion

scale ambiguity

Innovation

Methods, ideas, or system contributions that make the work stand out.

scale consistency

Sim(3) pose graph

loop closure rejection