Efficient On-policy Visual-RL via Stochastic Decoupled Policy Gradient

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This work addresses the high computational and memory costs as well as low training efficiency in visual reinforcement learning by proposing Stochastic Decoupled Policy Gradient (SDPG), a method that enables end-to-end training of visuomotor control policies through efficient policy gradient estimation via lightweight stochastic perturbations during trajectory replay. SDPG substantially reduces reliance on batch-rendered environments and introduces a new visual robotics benchmark encompassing dexterous manipulation and complex locomotion tasks. Experimental results demonstrate that SDPG outperforms existing baselines on visual MuJoCo tasks across multiple metrics—including training time, memory consumption, and cumulative reward—and successfully achieves sim-to-real transfer on physical robots.

📝 Abstract

We present the stochastic decoupled policy gradient (SDPG), a lightweight visual reinforcement learning (RL) method that trains diverse visuomotor control policies end-to-end within a few hours on a single NVIDIA RTX 4080 GPU. SDPG estimates policy gradients via random perturbations of trajectory rollouts, requiring orders of magnitude fewer batch-rendered environments and substantially reducing compute and memory overhead. On visual MuJoCo benchmarks, SDPG consistently outperforms baseline methods in training time, memory usage, and rewards. Finally, to support future research, we introduce a suite of realistic visual robotics benchmarks spanning dexterous manipulation, challenging locomotion, and demonstrate effective sim-to-real transfer on physical hardware.

Problem

Research questions and friction points this paper is trying to address.

visual reinforcement learning

on-policy

policy gradient

efficient training

visuomotor control

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic Decoupled Policy Gradient

Visual Reinforcement Learning

Efficient On-policy Learning

Sim-to-Real Transfer

Lightweight RL

🔎 Similar Papers

No similar papers found.