Robot Policy Transfer with Online Demonstrations: An Active Reinforcement Learning Approach

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address covariate shift in policy transfer caused by offline demonstrations, this paper introduces the first online demonstration-based active learning paradigm for policy transfer: under a fixed demonstration budget, it dynamically optimizes both the timing and content of demonstrations via an active querying mechanism. The method integrates online learning, learning from demonstration (LfD), and reinforcement learning to enable adaptive policy calibration in out-of-distribution environments. Key contributions include: (1) the first online demonstration-driven policy transfer framework; (2) a budget-aware joint optimization mechanism for temporal scheduling and demonstration content selection; and (3) overcoming performance bottlenecks inherent in offline demonstration methods. Evaluated across eight cross-environment, cross-task, and cross-ontology robotic simulation scenarios, the approach achieves a 27.4% average success rate improvement and 3.1× higher sample efficiency. It further demonstrates robust sim-to-real transfer across three real-world deployment settings.

Technology Category

Application Category

📝 Abstract
Transfer Learning (TL) is a powerful tool that enables robots to transfer learned policies across different environments, tasks, or embodiments. To further facilitate this process, efforts have been made to combine it with Learning from Demonstrations (LfD) for more flexible and efficient policy transfer. However, these approaches are almost exclusively limited to offline demonstrations collected before policy transfer starts, which may suffer from the intrinsic issue of covariance shift brought by LfD and harm the performance of policy transfer. Meanwhile, extensive work in the learning-from-scratch setting has shown that online demonstrations can effectively alleviate covariance shift and lead to better policy performance with improved sample efficiency. This work combines these insights to introduce online demonstrations into a policy transfer setting. We present Policy Transfer with Online Demonstrations, an active LfD algorithm for policy transfer that can optimize the timing and content of queries for online episodic expert demonstrations under a limited demonstration budget. We evaluate our method in eight robotic scenarios, involving policy transfer across diverse environment characteristics, task objectives, and robotic embodiments, with the aim to transfer a trained policy from a source task to a related but different target task. The results show that our method significantly outperforms all baselines in terms of average success rate and sample efficiency, compared to two canonical LfD methods with offline demonstrations and one active LfD method with online demonstrations. Additionally, we conduct preliminary sim-to-real tests of the transferred policy on three transfer scenarios in the real-world environment, demonstrating the policy effectiveness on a real robot manipulator.
Problem

Research questions and friction points this paper is trying to address.

Enhance robot policy transfer using online demonstrations.
Address covariance shift in Learning from Demonstrations (LfD).
Optimize timing and content of online expert demonstrations.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates online demonstrations for policy transfer
Optimizes timing and content of expert queries
Enhances success rate and sample efficiency
🔎 Similar Papers
No similar papers found.