🤖 AI Summary
This work addresses the poor generalization and insufficient safety of end-to-end visual navigation in unknown, cluttered, or narrow environments by proposing a hierarchical visual navigation framework. The approach integrates high-level guidance from an end-to-end model with low-level safety-constrained control: it first transforms visual observations into a traversable area map and then employs a novel Model Predictive Stein Variational Evolution Strategy (MP-SVES) to efficiently generate safe trajectories, which are accurately tracked by a model predictive controller (MPC). Experimental results demonstrate that, in complex scenarios involving unknown obstacles, densely cluttered unstructured spaces, and narrow corridors, the proposed method significantly outperforms baseline approaches such as ViNT and NoMaD, achieving higher navigation success rates and lower collision rates.
📝 Abstract
Visual navigation is a core capability for mobile robots, yet end-to-end learning-based methods often struggle with generalization and safety in unseen, cluttered, or narrow environments. These limitations are especially pronounced in dense indoor settings, where collisions are likely and end-to-end models frequently fail. To address this, we propose SaferPath, a hierarchical visual navigation framework that leverages learned guidance from existing end-to-end models and refines it through a safety-constrained optimization-control module. SaferPath transforms visual observations into a traversable-area map and refines guidance trajectories using Model Predictive Stein Variational Evolution Strategy (MP-SVES), efficiently generating safe trajectories in only a few iterations. The refined trajectories are tracked by an MPC controller, ensuring robust navigation in complex environments. Extensive experiments in scenarios with unseen obstacles, dense unstructured spaces, and narrow corridors demonstrate that SaferPath consistently improves success rates and reduces collisions, outperforming representative baselines such as ViNT and NoMaD, and enabling safe navigation in challenging real-world settings.