🤖 AI Summary
Traditional robotic visual navigation relies on global 3D maps or task-specific controllers, suffering from high computational overhead and poor cross-environment generalization. To address these limitations, we propose a pure-RGB-driven, object-level topological-metric hybrid navigation framework—enabling zero-shot, long-horizon, map-free, and controller-free end-to-end navigation for the first time. Our method integrates foundation-model-based monocular depth and traversability joint estimation, object-centric topological graph construction, local metric trajectory control, and an autonomous backtracking mechanism, augmented by a dynamic mode-switching strategy for enhanced robustness. Evaluated in both simulation and real-world settings, our approach significantly outperforms state-of-the-art methods, demonstrating strong open-set generalization, real-time performance, and practical deployability.
📝 Abstract
Visual navigation in robotics traditionally relies on globally-consistent 3D maps or learned controllers, which can be computationally expensive and difficult to generalize across diverse environments. In this work, we present a novel RGB-only, object-level topometric navigation pipeline that enables zero-shot, long-horizon robot navigation without requiring 3D maps or pre-trained controllers. Our approach integrates global topological path planning with local metric trajectory control, allowing the robot to navigate towards object-level sub-goals while avoiding obstacles. We address key limitations of previous methods by continuously predicting local trajectory using monocular depth and traversability estimation, and incorporating an auto-switching mechanism that falls back to a baseline controller when necessary. The system operates using foundational models, ensuring open-set applicability without the need for domain-specific fine-tuning. We demonstrate the effectiveness of our method in both simulated environments and real-world tests, highlighting its robustness and deployability. Our approach outperforms existing state-of-the-art methods, offering a more adaptable and effective solution for visual navigation in open-set environments. The source code is made publicly available: https://github.com/podgorki/TANGO.