ETGL-DDPG: A Deep Deterministic Policy Gradient Algorithm for Sparse Reward Continuous Control

📅 2024-10-07
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF

career value

217K/year
🤖 AI Summary
To address insufficient exploration and inefficient reward exploitation in sparse-reward continuous-control tasks under DDPG, this paper proposes three synergistic improvements: (1) a time-varying εₜ-greedy policy to enhance state-space coverage; (2) a dual experience replay buffer (GDRB) that separates high- and low-return trajectories to improve sample discriminability; and (3) longest-n-step return estimation to strengthen temporal propagation of sparse positive rewards. All modifications require no additional networks or model assumptions, thereby preserving algorithmic simplicity while significantly improving training stability and convergence speed. Evaluated on standard sparse-reward benchmarks—including AntMaze and Sparse HalfCheetah—the method outperforms the original DDPG as well as state-of-the-art approaches (TD3, SAC-Sparse). Ablation studies confirm that each component contributes significantly and complementarily to overall performance.

Technology Category

Application Category

📝 Abstract
We consider deep deterministic policy gradient (DDPG) in the context of reinforcement learning with sparse rewards. To enhance exploration, we introduce a search procedure, emph{${epsilon}{t}$-greedy}, which generates exploratory options for exploring less-visited states. We prove that search using $epsilon t$-greedy has polynomial sample complexity under mild MDP assumptions. To more efficiently use the information provided by rewarded transitions, we develop a new dual experience replay buffer framework, emph{GDRB}, and implement emph{longest n-step returns}. The resulting algorithm, emph{ETGL-DDPG}, integrates all three techniques: m{$epsilon t$}-greedy, extbf{G}DRB, and extbf{L}ongest $n$-step, into DDPG. We evaluate ETGL-DDPG on standard benchmarks and demonstrate that it outperforms DDPG, as well as other state-of-the-art methods, across all tested sparse-reward continuous environments. Ablation studies further highlight how each strategy individually enhances the performance of DDPG in this setting.
Problem

Research questions and friction points this paper is trying to address.

Enhances exploration in sparse reward environments
Introduces dual experience replay buffer framework
Integrates multiple techniques to outperform DDPG
Innovation

Methods, ideas, or system contributions that make the work stand out.

epsilon-t-greedy search
dual experience replay buffer
longest n-step returns
🔎 Similar Papers
No similar papers found.