Efficient On-policy Visual-RL via Stochastic Decoupled Policy Gradient

📅 2026-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational and memory costs as well as low training efficiency in visual reinforcement learning by proposing Stochastic Decoupled Policy Gradient (SDPG), a method that enables end-to-end training of visuomotor control policies through efficient policy gradient estimation via lightweight stochastic perturbations during trajectory replay. SDPG substantially reduces reliance on batch-rendered environments and introduces a new visual robotics benchmark encompassing dexterous manipulation and complex locomotion tasks. Experimental results demonstrate that SDPG outperforms existing baselines on visual MuJoCo tasks across multiple metrics—including training time, memory consumption, and cumulative reward—and successfully achieves sim-to-real transfer on physical robots.
📝 Abstract
We present the stochastic decoupled policy gradient (SDPG), a lightweight visual reinforcement learning (RL) method that trains diverse visuomotor control policies end-to-end within a few hours on a single NVIDIA RTX 4080 GPU. SDPG estimates policy gradients via random perturbations of trajectory rollouts, requiring orders of magnitude fewer batch-rendered environments and substantially reducing compute and memory overhead. On visual MuJoCo benchmarks, SDPG consistently outperforms baseline methods in training time, memory usage, and rewards. Finally, to support future research, we introduce a suite of realistic visual robotics benchmarks spanning dexterous manipulation, challenging locomotion, and demonstrate effective sim-to-real transfer on physical hardware.
Problem

Research questions and friction points this paper is trying to address.

visual reinforcement learning
on-policy
policy gradient
efficient training
visuomotor control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic Decoupled Policy Gradient
Visual Reinforcement Learning
Efficient On-policy Learning
Sim-to-Real Transfer
Lightweight RL
🔎 Similar Papers
No similar papers found.