MASt3R-Nav: WayPixel Navigation in Relative 3D Maps

๐Ÿ“… 2026-05-22
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Traditional visual navigation struggles to balance global geometric consistency with topological generalization, limiting its performance in complex environments. This work proposes a novel map representation based on pixel-level relative 3D connectivity, which constructs a pixel correspondence graph in a relative coordinate frame from image sequences and generates a โ€œWayPixel Costmapโ€ for planning and control. By preserving high-fidelity geometric information without requiring global geometric consistency, the approach overcomes the limitations of conventional topological graphs and dense reconstructions. Experimental results demonstrate that the method significantly outperforms image-level and object-level representations across four simulated tasks and real-world scenarios, validating its accuracy and practicality for visual navigation.
๐Ÿ“ Abstract
Visual navigation ability is strongly tied to its underlying representation of the world. Unlike classical 3D maps that require globally-consistent geometry, image- or object-relative topological graphs almost entirely do away with geometric understanding. But, this comes at the cost of navigation capability, often limiting it to merely teach-and-repeat. In this work, we propose a novel map representation in the form of pixel-relative connectivity, which is geometrically accurate but does not require global geometric consistency. Inspired by recent progress in 3D grounded image matching, we construct a map from an image sequence through inter-image connectivity based on pixel correspondences in the relative 3D coordinate systems of individual image pairs. We then use this pixel-level graph to perform global path planning by approximating and sparsifying intra-image pixel connectivity. Through this, we derive a ''WayPixel Costmap'' representation and train a controller conditioned on it to predict a trajectory rollout. We show that this dense pixel-level costmap based on relative geometry is a more accurate conditioning variable for control prediction than its image- and object-level counterparts. This enables a highly capable navigation system, as validated on four types of navigation tasks in the simulator and through real world demonstrations.
Problem

Research questions and friction points this paper is trying to address.

visual navigation
3D map representation
geometric consistency
pixel-relative connectivity
topological graph
Innovation

Methods, ideas, or system contributions that make the work stand out.

pixel-relative connectivity
relative 3D maps
WayPixel Costmap
visual navigation
3D grounded image matching
๐Ÿ”Ž Similar Papers
No similar papers found.
V
Vansh Garg
Robotics Research Center, IIIT-Hyderabad, India
Rohit Jayanti
Rohit Jayanti
Graduate Researcher, IIIT-Hyderabad
Visual SLAMStructure-from-Motion3D Scene Understanding
K
Krish Pandya
Robotics Research Center, IIIT-Hyderabad, India
S
Sarthak Chittawar
Robotics Research Center, IIIT-Hyderabad, India
S
Siddharth Tourani
University of Heidelberg
Muhammad Haris Khan
Muhammad Haris Khan
Faculty at Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) - UAE
Domain GeneralizationDomain AdaptationLandmark DetectionModel CalibrationFew-shot Learning
Sourav Garg
Sourav Garg
(former) Research Fellow, Uni. Adelaide
RoboticsComputer VisionDeep Learning
M
Madhava Krishna
Robotics Research Center, IIIT-Hyderabad, India