Learning Adaptive Multi-Objective Robot Navigation Incorporating Demonstrations

📅 2024-04-07

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

To address the challenge of context-dependent user preference shifts in dynamic human-robot coexistence environments—where conventional static-reward reinforcement learning fails to adapt online—this paper proposes an online preference-adaptive navigation method that requires no retraining. The core innovation lies in a modulatable reward-teaching coupling mechanism, integrating multi-objective reinforcement learning (MORL), inverse-reinforcement-learning-inspired teaching embedding, and a preference-conditioned policy network, all embedded within a sim-to-real transfer framework. The approach enables zero-shot preference switching and real-time adjustment of teaching weights. Experimental evaluation on a dual-robot platform demonstrates significant improvements: 18.3% higher goal-reaching rate and 22.7% improved collision avoidance rate. Moreover, it accurately reproduces diverse, context-sensitive user preference behaviors, effectively overcoming two critical bottlenecks in personalized navigation—policy rigidity and context insensitivity.

Technology Category

Application Category

📝 Abstract

Preference-aligned robot navigation in human environments is typically achieved through learning-based approaches, utilizing user feedback or demonstrations for personalization. However, personal preferences are subject to change and might even be context-dependent. Yet traditional reinforcement learning (RL) approaches with static reward functions often fall short in adapting to these varying user preferences, inevitably reflecting demonstrations once training is completed. This paper introduces a framework that combines multi-objective reinforcement learning (MORL) with demonstration-based learning. Our approach allows for dynamic adaptation to changing user preferences without retraining. It fluently modulates between reward-defined preference objectives and the amount of demonstration data reflection. Through rigorous evaluations, including a sim-to-real transfer on two robots, we demonstrate our framework's capability to reflect user preferences accurately while achieving high navigational performance in terms of collision avoidance and goal pursuance.

Problem

Research questions and friction points this paper is trying to address.

Adapting robot navigation to changing user preferences

Combining demonstration-based learning with multi-objective RL

Ensuring dynamic policy adaptation without retraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines demonstration-based learning with MORL

Dynamic adaptation to changing user preferences

Modulates demonstration data and preference objectives

🔎 Similar Papers

No similar papers found.