Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data

📅 2025-01-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Sparse-reward online reinforcement learning (RL) suffers from inefficient policy optimization and often relies on expert demonstrations or hand-crafted auxiliary objectives. To address this, we propose GILD—a general-purpose imitation learning framework that automatically distills transferable intrinsic reward objectives from non-expert offline datasets via meta-learning. GILD requires no expert demonstrations, domain-specific priors, or task-dependent hyperparameters, and seamlessly integrates with off-policy RL algorithms including SAC, TD3, and REDQ. Evaluated on four sparse-reward MuJoCo benchmarks, GILD-enhanced agents consistently outperform state-of-the-art methods in both convergence speed and final performance, with negligible computational overhead. Our key contribution is the first meta-learning-driven mechanism for generating general-purpose auxiliary objectives, enabling effective and scalable transfer from offline data to online RL.

Technology Category

Application Category

📝 Abstract
A major challenge in Reinforcement Learning (RL) is the difficulty of learning an optimal policy from sparse rewards. Prior works enhance online RL with conventional Imitation Learning (IL) via a handcrafted auxiliary objective, at the cost of restricting the RL policy to be sub-optimal when the offline data is generated by a non-expert policy. Instead, to better leverage valuable information in offline data, we develop Generalized Imitation Learning from Demonstration (GILD), which meta-learns an objective that distills knowledge from offline data and instills intrinsic motivation towards the optimal policy. Distinct from prior works that are exclusive to a specific RL algorithm, GILD is a flexible module intended for diverse vanilla off-policy RL algorithms. In addition, GILD introduces no domain-specific hyperparameter and minimal increase in computational cost. In four challenging MuJoCo tasks with sparse rewards, we show that three RL algorithms enhanced with GILD significantly outperform state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Sparse Reward
Reinforcement Learning
Non-Expert Dataset Performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

GILD
Sparse Reward Optimization
Robot Learning
🔎 Similar Papers
No similar papers found.
S
Shilong Deng
University of Electronic Science and Technology of China, Chengdu, China
Z
Zetao Zheng
University of Electronic Science and Technology of China, Chengdu, China; Sichuan Artificial Intelligence Research Institute, Yibin, China
H
Hongcai He
University of Electronic Science and Technology of China, Chengdu, China
Paul Weng
Paul Weng
Duke Kunshan University
Artificial IntelligenceReinforcement Learning/Markov Decision ProcessQualitative/Ordinal Models
Jie Shao
Jie Shao
Professor, University of Electronic Science and Technology of China
MultimediaDatabase