🤖 AI Summary
This work proposes a path-conditioned deep reinforcement learning approach for local navigation that addresses the limitations of traditional hierarchical navigation systems. Conventional local planners suffer significant performance degradation under perceptual uncertainty or infeasible global paths due to their strong reliance on high-quality global trajectories and lack of global context. In contrast, the proposed method treats the reference path as implicit contextual input and is trained end-to-end using only sparse goal-reaching rewards, without requiring explicit path-tracking losses. This design enables the agent to achieve higher navigation efficiency when provided with accurate global paths, while maintaining robustness even when the reference path is severely degraded or entirely absent. By alleviating the strict dependency on precise global trajectories, the approach substantially enhances practicality and adaptability in long-horizon navigation scenarios characterized by uncertainty.
📝 Abstract
Long-range navigation is commonly addressed through hierarchical pipelines in which a global planner generates a path, decomposed into waypoints, and followed sequentially by a local planner. These systems are sensitive to global path quality, as inaccurate remote sensing data can result in locally infeasible waypoints, which degrade local execution. At the same time, the limited global context available to the local planner hinders long-range efficiency. To address this issue, we propose a reinforcement learning-based local navigation policy that leverages path information as contextual guidance. The policy is conditioned on reference path observations and trained with a reward function mainly based on goal-reaching objectives, without any explicit path-following reward. Through this implicit conditioning, the policy learns to opportunistically exploit path information while remaining robust to misleading or degraded guidance. Experimental results show that the proposed approach significantly improves navigation efficiency when high-quality paths are available and maintains baseline-level performance when path observations are severely degraded or even non-existent. These properties make the method particularly well-suited for long-range navigation scenarios in which high-level plans are approximate and local execution must remain adaptive to uncertainty.