H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps

📅 2023-09-22
🏛️ arXiv.org
📈 Citations: 6
Influential: 0
📄 PDF
🤖 AI Summary
In real-world scenarios where high-fidelity simulation is unavailable and offline data is scarce, existing online reinforcement learning (RL) suffers from sim-to-real dynamics mismatch, while offline RL is constrained by limited data scale and quality. Method: We propose a hybrid offline-online RL framework that systematically models and compensates for the dynamics gap between simulation and reality. Our approach features model-based gap modeling, a two-stage policy optimization scheme, and a joint mechanism of offline pretraining followed by online adaptive fine-tuning. It supports arbitrary combinations of offline and online RL algorithms, enhancing framework generality. Contribution/Results: Evaluated across diverse simulated and real-robot tasks, our method achieves up to 3.2× higher sample efficiency and a 41% improvement in policy transfer success rate over state-of-the-art cross-domain RL approaches.
📝 Abstract
Solving real-world complex tasks using reinforcement learning (RL) without high-fidelity simulation environments or large amounts of offline data can be quite challenging. Online RL agents trained in imperfect simulation environments can suffer from severe sim-to-real issues. Offline RL approaches although bypass the need for simulators, often pose demanding requirements on the size and quality of the offline datasets. The recently emerged hybrid offline-and-online RL provides an attractive framework that enables joint use of limited offline data and imperfect simulator for transferable policy learning. In this paper, we develop a new algorithm, called H2O+, which offers great flexibility to bridge various choices of offline and online learning methods, while also accounting for dynamics gaps between the real and simulation environment. Through extensive simulation and real-world robotics experiments, we demonstrate superior performance and flexibility over advanced cross-domain online and offline RL algorithms.
Problem

Research questions and friction points this paper is trying to address.

Bridging dynamics gaps between real and simulation environments
Combining limited offline data with imperfect simulators
Improving transferable policy learning in hybrid RL
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid offline-and-online RL framework
Bridges offline and online learning methods
Accounts for dynamics gaps simulation-real
🔎 Similar Papers
No similar papers found.