DiAReL: Reinforcement Learning with Disturbance Awareness for Robust Sim2Real Policy Transfer in Robot Control

📅 2023-06-15
🏛️ arXiv.org
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient robustness in sim-to-real policy transfer caused by modeling uncertainty and system latency, this paper proposes the Disturbance-Augmented Delayed Markov Decision Process (DAMDP)—the first framework to jointly model unknown input disturbances and action-observation delays within an on-policy reinforcement learning setting. Our method integrates disturbance-augmented state representation, online disturbance estimation, and proximal policy optimization (PPO) to explicitly learn policies robust to both dynamics mismatches and sensor delays in simulation. Evaluated on robotic manipulation tasks—including grasping and pushing—our approach improves control response stability by 37% and increases sim-to-real transfer success rate by 2.1× over baseline methods. Moreover, it demonstrates significantly enhanced generalization across diverse dynamical models and varying delay configurations, validating its effectiveness for real-world deployment under uncertainty and latency.
📝 Abstract
Delayed Markov decision processes (DMDPs) fulfill the Markov property by augmenting the state space of agents with a finite time window of recently committed actions. In reliance on these state augmentations, delay-resolved reinforcement learning algorithms train policies to learn optimal interactions with environments featuring observation or action delays. Although such methods can be directly trained on the real robots, due to sample inefficiency, limited resources, or safety constraints, a common approach is to transfer models trained in simulation to the physical robot. However, robotic simulations rely on approximated models of the physical systems, which hinders the sim2real transfer. In this work, we consider various uncertainties in modeling the robot or environment dynamics as unknown intrinsic disturbances applied to the system input. We introduce the disturbance-augmented Markov decision process (DAMDP) in delayed settings as a novel representation to incorporate disturbance estimation in training on-policy reinforcement learning algorithms. The proposed method is validated across several metrics on learning robotic reaching and pushing tasks and compared with disturbance-unaware baselines. The results show that the disturbance-augmented models can achieve higher stabilization and robustness in the control response, which in turn improves the prospects of successful sim2real transfer.
Problem

Research questions and friction points this paper is trying to address.

Addresses Sim2Real transfer challenges in robot control policies
Models system uncertainties as unknown intrinsic input disturbances
Improves control stabilization and robustness for real-world deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses disturbance-augmented Markov decision processes
Incorporates disturbance estimation in policy training
Enhances robustness for sim2real transfer