Accelerating Residual Reinforcement Learning with Uncertainty Estimation

📅 2025-06-20

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing residual reinforcement learning (RL) suffers from low sample efficiency under sparse rewards and struggles to accommodate stochastic base policies (e.g., Gaussian or diffusion-based policies). To address this, we propose an uncertainty-guided off-policy residual RL framework. First, we estimate base policy uncertainty via Bayesian or ensemble methods to dynamically guide exploration. Second, we introduce an off-policy residual Q-learning mechanism with observable base actions—enabling stable training for the first time with stochastic base policies. Our method seamlessly integrates Gaussian policy optimization and diffusion-based policy modeling. Evaluated on multi-task benchmarks (Robosuite and D4RL), it significantly outperforms fine-tuning, imitation-augmented, and prior residual RL approaches. Moreover, it achieves zero-shot sim-to-real transfer and robust execution on real robots. Key contributions include: (1) uncertainty-driven exploration grounded in base policy estimation, and (2) the first off-policy residual RL framework supporting stochastic base policies.

Technology Category

Application Category

📝 Abstract

Residual Reinforcement Learning (RL) is a popular approach for adapting pretrained policies by learning a lightweight residual policy that provides corrective actions. While Residual RL is more sample-efficient than finetuning the entire base policy, existing methods struggle with sparse rewards and are designed for deterministic base policies. We propose two improvements to Residual RL that further enhance its sample efficiency and make it suitable for stochastic base policies. First, we leverage uncertainty estimates of the base policy to focus exploration on regions in which the base policy is not confident. Second, we propose a simple modification to off-policy residual learning that allows it to observe base actions and better handle stochastic base policies. We evaluate our method with both Gaussian-based and Diffusion-based stochastic base policies on tasks from Robosuite and D4RL, and compare against state-of-the-art finetuning methods, demo-augmented RL methods, and other residual RL methods. Our algorithm significantly outperforms existing baselines in a variety of simulation benchmark environments. We also deploy our learned polices in the real world to demonstrate their robustness with zero-shot sim-to-real transfer.

Problem

Research questions and friction points this paper is trying to address.

Enhancing sample efficiency in Residual RL for stochastic policies

Addressing sparse rewards in Residual Reinforcement Learning

Improving robustness for sim-to-real transfer in RL

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverage base policy uncertainty for focused exploration

Modify off-policy learning to observe base actions

Handle stochastic base policies effectively

🔎 Similar Papers

Reward Machines for Deep RL in Noisy and Uncertain Environments