Learning Adaptive Multi-Objective Robot Navigation Incorporating Demonstrations

📅 2024-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of context-dependent user preference shifts in dynamic human-robot coexistence environments—where conventional static-reward reinforcement learning fails to adapt online—this paper proposes an online preference-adaptive navigation method that requires no retraining. The core innovation lies in a modulatable reward-teaching coupling mechanism, integrating multi-objective reinforcement learning (MORL), inverse-reinforcement-learning-inspired teaching embedding, and a preference-conditioned policy network, all embedded within a sim-to-real transfer framework. The approach enables zero-shot preference switching and real-time adjustment of teaching weights. Experimental evaluation on a dual-robot platform demonstrates significant improvements: 18.3% higher goal-reaching rate and 22.7% improved collision avoidance rate. Moreover, it accurately reproduces diverse, context-sensitive user preference behaviors, effectively overcoming two critical bottlenecks in personalized navigation—policy rigidity and context insensitivity.

Technology Category

Application Category

📝 Abstract
Preference-aligned robot navigation in human environments is typically achieved through learning-based approaches, utilizing user feedback or demonstrations for personalization. However, personal preferences are subject to change and might even be context-dependent. Yet traditional reinforcement learning (RL) approaches with static reward functions often fall short in adapting to these varying user preferences, inevitably reflecting demonstrations once training is completed. This paper introduces a framework that combines multi-objective reinforcement learning (MORL) with demonstration-based learning. Our approach allows for dynamic adaptation to changing user preferences without retraining. It fluently modulates between reward-defined preference objectives and the amount of demonstration data reflection. Through rigorous evaluations, including a sim-to-real transfer on two robots, we demonstrate our framework's capability to reflect user preferences accurately while achieving high navigational performance in terms of collision avoidance and goal pursuance.
Problem

Research questions and friction points this paper is trying to address.

Adapting robot navigation to changing user preferences
Combining demonstration-based learning with multi-objective RL
Ensuring dynamic policy adaptation without retraining
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines demonstration-based learning with MORL
Dynamic adaptation to changing user preferences
Modulates demonstration data and preference objectives
🔎 Similar Papers
No similar papers found.