๐ค AI Summary
This work addresses the challenge of effectively integrating environment-dependent natural language preferences into robotic navigation control. The authors propose a context-aware navigation framework that synergistically combines foundation models with multi-objective reinforcement learning: a vision-language model interprets scene context, while a large language model translates user-provided natural language feedback into interpretable behavioral rules, which are then mapped to preference vectors to dynamically modulate a pretrained policy. This approach enables an end-to-end mapping from natural language preferences to navigation behaviors and introduces an updatable rule-memory mechanism to enhance the transparency and controllability of adaptive actions. Experimental results demonstrate that the system accurately captures user intent, generates consistent preferences, and achieves safe, efficient, and human-aligned navigation across component-level evaluations, user studies, and real-world indoor deployments.
๐ Abstract
Robots operating in human-shared environments must not only achieve task-level navigation objectives such as safety and efficiency, but also adapt their behavior to human preferences. However, as human preferences are typically expressed in natural language and depend on environmental context, it is difficult to directly integrate them into low-level robot control policies. In this work, we present a pipeline that enables robots to understand and apply context-dependent navigation preferences by combining foundational models with a Multi-Objective Reinforcement Learning (MORL) navigation policy. Thus, our approach integrates high-level semantic reasoning with low-level motion control. A Vision-Language Model (VLM) extracts structured environmental context from onboard visual observations, while Large Language Models (LLM) convert natural language user feedback into interpretable, context-dependent behavioral rules stored in a persistent but updatable rule memory. A preference translation module then maps contextual information and stored rules into numerical preference vectors that parameterize a pretrained MORL policy for real-time navigation adaptation. We evaluate the proposed framework through quantitative component-level evaluations, a user study, and real-world robot deployments in various indoor environments. Our results demonstrate that the system reliably captures user intent, generates consistent preference vectors, and enables controllable behavior adaptation across diverse contexts. Overall, the proposed pipeline improves the adaptability, transparency, and usability of robots operating in shared human environments, while maintaining safe and responsive real-time control.