🤖 AI Summary
To address insufficient safety, feasibility, and task adaptability of pre-planned trajectories in dynamic environments, this paper proposes a trajectory optimization framework integrating Episodic Reinforcement Learning (ERL) with residual learning. Rather than training policies from scratch, the method leverages an initial reference trajectory as prior knowledge; within each episode, it identifies local trajectory segments requiring correction and generates smooth, physically feasible residual adjustments using B-spline motion primitives—preserving critical actions while enhancing trajectory continuity. The framework is modular and can be seamlessly integrated into any ERL algorithm. Experiments demonstrate that, compared to end-to-end training, our approach significantly improves sample efficiency (reducing sampling requirements by 67%) and task success rate, while effectively narrowing the sim-to-real gap. Crucially, the learned policy transfers directly to real robotic hardware without fine-tuning, with physical deployment confirming its robustness and real-time performance.
📝 Abstract
We propose MoRe-ERL, a framework that combines Episodic Reinforcement Learning (ERL) and residual learning, which refines preplanned reference trajectories into safe, feasible, and efficient task-specific trajectories. This framework is general enough to incorporate into arbitrary ERL methods and motion generators seamlessly. MoRe-ERL identifies trajectory segments requiring modification while preserving critical task-related maneuvers. Then it generates smooth residual adjustments using B-Spline-based movement primitives to ensure adaptability to dynamic task contexts and smoothness in trajectory refinement. Experimental results demonstrate that residual learning significantly outperforms training from scratch using ERL methods, achieving superior sample efficiency and task performance. Hardware evaluations further validate the framework, showing that policies trained in simulation can be directly deployed in real-world systems, exhibiting a minimal sim-to-real gap.