π€ AI Summary
Traditional robot navigation in dynamic human-robot coexistence environments relies on reactive responses, rendering it incapable of anticipating sudden hazardsβe.g., pedestrians abruptly entering through open doors.
Method: This paper proposes a zero-shot, language-driven cost-mapping framework leveraging Vision-Language Models (VLMs). It introduces natural language instructions as spatial cost signals into the navigation pipeline; the VLM interprets visual semantics and infers latent dynamic risks, which are fused with geometric maps to generate language-guided, risk-aware cost maps.
Contribution/Results: The approach shifts navigation from reactive obstacle avoidance to proactive hazard mitigation. Evaluations across diverse dynamic simulation scenarios demonstrate significantly improved navigation success rates and reduced hazardous encounter frequency, outperforming state-of-the-art reactive planning methods.
π Abstract
Robots operating in human-centric or hazardous environments must proactively anticipate and mitigate dangers beyond basic obstacle detection. Traditional navigation systems often depend on static maps, which struggle to account for dynamic risks, such as a person emerging from a suddenly opening door. As a result, these systems tend to be reactive rather than anticipatory when handling dynamic hazards. Recent advancements in pre-trained large language models and vision-language models (VLMs) create new opportunities for proactive hazard avoidance. In this work, we propose a zero-shot language-as-cost mapping framework that leverages VLMs to interpret visual scenes, assess potential dynamic risks, and assign risk-aware navigation costs preemptively, enabling robots to anticipate hazards before they materialize. By integrating this language-based cost map with a geometric obstacle map, the robot not only identifies existing obstacles but also anticipates and proactively plans around potential hazards arising from environmental dynamics. Experiments in simulated and diverse dynamic environments demonstrate that the proposed method significantly improves navigation success rates and reduces hazard encounters, compared to reactive baseline planners. Code and supplementary materials are available at https://github.com/Taekmino/LaC.