Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization

📅 2025-06-12

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work addresses the problem of robustly satisfying state constraints for reinforcement learning (RL) policies under unknown disturbances. We propose a synergistic mechanism combining entropy regularization with constraint penalization (Lagrangian or penalty methods). First, we theoretically establish that entropy regularization intrinsically biases policies toward actions with higher future feasibility, thereby enhancing disturbance resilience and safety. Second, we relax strict constrained RL into an approximately equivalent unconstrained RL problem—solvable by standard off-the-shelf algorithms—achieving a principled trade-off among safety, optimality, and robustness. Our model-free approach integrates reward shaping and constraint learning within frameworks such as Soft Actor-Critic (SAC). Evaluated on multiple continuous-control benchmarks, our method reduces constraint violation rates by up to 47% while preserving near-optimal task performance, demonstrating significant improvements in robustness against both action noise and environmental disturbances.

Technology Category

Application Category

📝 Abstract

Despite the many recent advances in reinforcement learning (RL), the question of learning policies that robustly satisfy state constraints under unknown disturbances remains open. In this paper, we offer a new perspective on achieving robust safety by analyzing the interplay between two well-established techniques in model-free RL: entropy regularization, and constraints penalization. We reveal empirically that entropy regularization in constrained RL inherently biases learning toward maximizing the number of future viable actions, thereby promoting constraints satisfaction robust to action noise. Furthermore, we show that by relaxing strict safety constraints through penalties, the constrained RL problem can be approximated arbitrarily closely by an unconstrained one and thus solved using standard model-free RL. This reformulation preserves both safety and optimality while empirically improving resilience to disturbances. Our results indicate that the connection between entropy regularization and robustness is a promising avenue for further empirical and theoretical investigation, as it enables robust safety in RL through simple reward shaping.

Problem

Research questions and friction points this paper is trying to address.

Achieving robust safety in reinforcement learning under disturbances

Linking entropy regularization to constraints satisfaction robustness

Reformulating constrained RL as unconstrained via penalty relaxation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Entropy regularization enhances action viability

Relaxed safety constraints via penalty reformulation

Reward shaping ensures robust safety

🔎 Similar Papers

Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding