Learning to Localize Reference Trajectories in Image-Space for Visual Navigation

📅 2026-02-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing visual navigation approaches that rely on robot-specific training, camera calibration, or pose estimation by proposing a cross-platform, zero-shot navigation method. The approach generates robot-agnostic visual guidance signals by localizing reference RGB trajectory points directly in image space, thereby decoupling perception from action and enabling any robot to navigate using only ordinary video inputs. Key technical components include image-space coordinate prediction, reference trajectory localization, and integration with local planning—all without requiring camera calibration or explicit pose estimation. Moreover, the method supports cross-trajectory training to enhance robustness to viewpoint and camera variations. Experimental results demonstrate remarkable performance, achieving 94–98% success rates on forward navigation tasks—surpassing current methods by 20–50 percentage points—and yielding over fivefold improvement on challenging maneuvers such as backward navigation.

Technology Category

Application Category

📝 Abstract
We present LoTIS, a model for visual navigation that provides robot-agnostic image-space guidance by localizing a reference RGB trajectory in the robot's current view, without requiring camera calibration, poses, or robot-specific training. Instead of predicting actions tied to specific robots, we predict the image-space coordinates of the reference trajectory as they would appear in the robot's current view. This creates robot-agnostic visual guidance that easily integrates with local planning. Consequently, our model's predictions provide guidance zero-shot across diverse embodiments. By decoupling perception from action and learning to localize trajectory points rather than imitate behavioral priors, we enable a cross-trajectory training strategy for robustness to viewpoint and camera changes. We outperform state-of-the-art methods by 20-50 percentage points in success rate on conventional forward navigation, achieving 94-98% success rate across diverse sim and real environments. Furthermore, we achieve over 5x improvements on challenging tasks where baselines fail, such as backward traversal. The system is straightforward to use: we show how even a video from a phone camera directly enables different robots to navigate to any point on the trajectory. Videos, demo, and code are available at https://finnbusch.com/lotis.
Problem

Research questions and friction points this paper is trying to address.

visual navigation
robot-agnostic
trajectory localization
image-space guidance
zero-shot generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

visual navigation
trajectory localization
robot-agnostic
image-space guidance
zero-shot generalization
🔎 Similar Papers
No similar papers found.
F
Finn Lukas Busch
Division of Robotics, Perception, and Learning, KTH Royal Institute of Technology, Sweden; also affiliated with Digital Futures
M
Matti Vahs
Division of Robotics, Perception, and Learning, KTH Royal Institute of Technology, Sweden; also affiliated with Digital Futures
Q
Quantao Yang
Division of Robotics, Perception, and Learning, KTH Royal Institute of Technology, Sweden; also affiliated with Digital Futures
J
Jesús Gerardo Ortega Peimbert
Division of Robotics, Perception, and Learning, KTH Royal Institute of Technology, Sweden; also affiliated with Digital Futures
Yixi Cai
Yixi Cai
Postdoctoral Fellow, Division of Robotics, Perception and Learning, KTH
RoboticsLiDARMapping
Jana Tumova
Jana Tumova
KTH
formal methodsrobotics
Olov Andersson
Olov Andersson
Assistant Professor at KTH Royal Institute of Technology. Previously: ASL@ETH Zurich
Robot LearningAutonomous RobotsMotion PlanningMappingNavigation