Improving Stochastic Action-Constrained Reinforcement Learning via Truncated Distributions

📅 2025-11-27

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

In reinforcement learning with constrained action spaces, existing stochastic policies based on truncated normal distributions suffer from inefficient computation of entropy, log-probabilities, and their gradients—leading to unstable policy updates and degraded performance. This paper proposes an exact statistical estimation framework for truncated normal policies: it introduces efficient numerical approximation algorithms enabling high-accuracy, low-overhead computation of entropy and log-probabilities; integrates differentiable sampling with analytical gradient derivation to ensure stable, predictable-time policy updates. Crucially, the method avoids approximating the truncated distribution with an unconstrained one, thereby significantly improving optimization quality. Evaluated on three safety-constrained benchmark tasks, our approach achieves an average performance gain of 23.6% and reduces training variance by 41% compared to state-of-the-art approximation methods.

Technology Category

Application Category

📝 Abstract

In reinforcement learning (RL), it is often advantageous to consider additional constraints on the action space to ensure safety or action relevance. Existing work on such action-constrained RL faces challenges regarding effective policy updates, computational efficiency, and predictable runtime. Recent work proposes to use truncated normal distributions for stochastic policy gradient methods. However, the computation of key characteristics, such as the entropy, log-probability, and their gradients, becomes intractable under complex constraints. Hence, prior work approximates these using the non-truncated distributions, which severely degrades performance. We argue that accurate estimation of these characteristics is crucial in the action-constrained RL setting, and propose efficient numerical approximations for them. We also provide an efficient sampling strategy for truncated policy distributions and validate our approach on three benchmark environments, which demonstrate significant performance improvements when using accurate estimations.

Problem

Research questions and friction points this paper is trying to address.

Efficiently approximates entropy and gradients for truncated distributions

Enables accurate policy updates in action-constrained reinforcement learning

Improves performance via numerical approximations and sampling strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Efficient numerical approximations for truncated distributions

Accurate estimation of entropy and log-probability gradients

Sampling strategy for truncated policy distributions

🔎 Similar Papers

A Tractable Inference Perspective of Offline RL