Soft Deterministic Policy Gradient with Gaussian Smoothing

πŸ“… 2026-05-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

211K/year
πŸ€– AI Summary
This work addresses the issue of undefined and unstable gradients in deterministic policy gradient methods under sparse or discrete reward settings, where the Q-function is non-differentiable with respect to actions. To overcome this limitation, the paper proposes Soft Deterministic Policy Gradient (Soft-DPG), which introduces Gaussian smoothing into the deterministic policy gradient framework for the first time. By constructing a smoothed Bellman equation and redefining the action-value function, Soft-DPG circumvents the explicit reliance on the gradient of the Q-function with respect to actions. Theoretical analysis demonstrates that the proposed method ensures well-defined policy gradients even when the Q-function is non-smooth. Empirical results show that Soft-DPG achieves competitive performance in standard continuous control tasks with dense rewards and significantly outperforms DDPG in environments with sparse or discrete rewards.
πŸ“ Abstract
Deterministic policy gradient (DPG) is widely utilized for continuous control; however, it inherently relies on the differentiability of the critic with respect to the action during policy updates. This assumption is violated in practical control problems involving sparse or discrete rewards, leading to ill-defined policy gradients and unstable learning. To address these challenges, we propose a principled alternative based on a smoothed Bellman equation formulated via Gaussian smoothing. Specifically, we define a novel action-value function based on a smoothed Bellman equation and derive the soft deterministic policy gradient (Soft-DPG). Our formulation eliminates explicit dependence on critic action-gradients and ensures that the gradient remains well-defined even for non-smooth Q-functions. We instantiate this framework into a deep reinforcement learning algorithm, which we call soft deep deterministic policy gradient (Soft DDPG). Empirical evaluations on standard continuous control benchmarks and their discretized-reward variants show that Soft DDPG remains competitive in dense-reward settings and provides clear gains in most discretized-reward environments, where standard DDPG is more sensitive to irregular critic landscapes.
Problem

Research questions and friction points this paper is trying to address.

Deterministic Policy Gradient
Sparse Rewards
Discrete Rewards
Non-smooth Q-functions
Policy Gradient Stability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Soft Deterministic Policy Gradient
Gaussian Smoothing
Smoothed Bellman Equation
Non-smooth Q-functions
Discrete Rewards
πŸ”Ž Similar Papers
No similar papers found.