MASt3R-Nav: WayPixel Navigation in Relative 3D Maps

📅 2026-05-22

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Traditional visual navigation struggles to balance global geometric consistency with topological generalization, limiting its performance in complex environments. This work proposes a novel map representation based on pixel-level relative 3D connectivity, which constructs a pixel correspondence graph in a relative coordinate frame from image sequences and generates a “WayPixel Costmap” for planning and control. By preserving high-fidelity geometric information without requiring global geometric consistency, the approach overcomes the limitations of conventional topological graphs and dense reconstructions. Experimental results demonstrate that the method significantly outperforms image-level and object-level representations across four simulated tasks and real-world scenarios, validating its accuracy and practicality for visual navigation.

📝 Abstract

Visual navigation ability is strongly tied to its underlying representation of the world. Unlike classical 3D maps that require globally-consistent geometry, image- or object-relative topological graphs almost entirely do away with geometric understanding. But, this comes at the cost of navigation capability, often limiting it to merely teach-and-repeat. In this work, we propose a novel map representation in the form of pixel-relative connectivity, which is geometrically accurate but does not require global geometric consistency. Inspired by recent progress in 3D grounded image matching, we construct a map from an image sequence through inter-image connectivity based on pixel correspondences in the relative 3D coordinate systems of individual image pairs. We then use this pixel-level graph to perform global path planning by approximating and sparsifying intra-image pixel connectivity. Through this, we derive a ''WayPixel Costmap'' representation and train a controller conditioned on it to predict a trajectory rollout. We show that this dense pixel-level costmap based on relative geometry is a more accurate conditioning variable for control prediction than its image- and object-level counterparts. This enables a highly capable navigation system, as validated on four types of navigation tasks in the simulator and through real world demonstrations.

Problem

Research questions and friction points this paper is trying to address.

visual navigation

3D map representation

geometric consistency

pixel-relative connectivity

topological graph

Innovation

Methods, ideas, or system contributions that make the work stand out.

pixel-relative connectivity

relative 3D maps

WayPixel Costmap