🤖 AI Summary
To address the challenge of context-dependent user preference shifts in dynamic human-robot coexistence environments—where conventional static-reward reinforcement learning fails to adapt online—this paper proposes an online preference-adaptive navigation method that requires no retraining. The core innovation lies in a modulatable reward-teaching coupling mechanism, integrating multi-objective reinforcement learning (MORL), inverse-reinforcement-learning-inspired teaching embedding, and a preference-conditioned policy network, all embedded within a sim-to-real transfer framework. The approach enables zero-shot preference switching and real-time adjustment of teaching weights. Experimental evaluation on a dual-robot platform demonstrates significant improvements: 18.3% higher goal-reaching rate and 22.7% improved collision avoidance rate. Moreover, it accurately reproduces diverse, context-sensitive user preference behaviors, effectively overcoming two critical bottlenecks in personalized navigation—policy rigidity and context insensitivity.
📝 Abstract
Preference-aligned robot navigation in human environments is typically achieved through learning-based approaches, utilizing user feedback or demonstrations for personalization. However, personal preferences are subject to change and might even be context-dependent. Yet traditional reinforcement learning (RL) approaches with static reward functions often fall short in adapting to these varying user preferences, inevitably reflecting demonstrations once training is completed. This paper introduces a framework that combines multi-objective reinforcement learning (MORL) with demonstration-based learning. Our approach allows for dynamic adaptation to changing user preferences without retraining. It fluently modulates between reward-defined preference objectives and the amount of demonstration data reflection. Through rigorous evaluations, including a sim-to-real transfer on two robots, we demonstrate our framework's capability to reflect user preferences accurately while achieving high navigational performance in terms of collision avoidance and goal pursuance.