Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization

📅 2025-06-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of robustly satisfying state constraints for reinforcement learning (RL) policies under unknown disturbances. We propose a synergistic mechanism combining entropy regularization with constraint penalization (Lagrangian or penalty methods). First, we theoretically establish that entropy regularization intrinsically biases policies toward actions with higher future feasibility, thereby enhancing disturbance resilience and safety. Second, we relax strict constrained RL into an approximately equivalent unconstrained RL problem—solvable by standard off-the-shelf algorithms—achieving a principled trade-off among safety, optimality, and robustness. Our model-free approach integrates reward shaping and constraint learning within frameworks such as Soft Actor-Critic (SAC). Evaluated on multiple continuous-control benchmarks, our method reduces constraint violation rates by up to 47% while preserving near-optimal task performance, demonstrating significant improvements in robustness against both action noise and environmental disturbances.

Technology Category

Application Category

📝 Abstract
Despite the many recent advances in reinforcement learning (RL), the question of learning policies that robustly satisfy state constraints under unknown disturbances remains open. In this paper, we offer a new perspective on achieving robust safety by analyzing the interplay between two well-established techniques in model-free RL: entropy regularization, and constraints penalization. We reveal empirically that entropy regularization in constrained RL inherently biases learning toward maximizing the number of future viable actions, thereby promoting constraints satisfaction robust to action noise. Furthermore, we show that by relaxing strict safety constraints through penalties, the constrained RL problem can be approximated arbitrarily closely by an unconstrained one and thus solved using standard model-free RL. This reformulation preserves both safety and optimality while empirically improving resilience to disturbances. Our results indicate that the connection between entropy regularization and robustness is a promising avenue for further empirical and theoretical investigation, as it enables robust safety in RL through simple reward shaping.
Problem

Research questions and friction points this paper is trying to address.

Achieving robust safety in reinforcement learning under disturbances
Linking entropy regularization to constraints satisfaction robustness
Reformulating constrained RL as unconstrained via penalty relaxation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Entropy regularization enhances action viability
Relaxed safety constraints via penalty reformulation
Reward shaping ensures robust safety