🤖 AI Summary
Formalizing safety constraints in reinforcement learning remains challenging due to the difficulty of specifying explicit safety criteria or accurate system dynamics a priori.
Method: This paper proposes a model-free safety framework grounded in reversibility—using state reversibility as an implicit, knowledge-free safety criterion. It integrates Model Predictive Path Integral (MPPI) control with real-time reversibility assessment during policy training, dynamically intercepting irreversible (i.e., potentially unsafe) actions via black-box environment queries. A decoupled safety evaluation architecture ensures orthogonality between safety enforcement and policy optimization.
Contribution/Results: The approach achieves 100% interception of unsafe actions while matching the training efficiency and task performance of PPO baselines. It is the first work to introduce reversibility as a principled foundation for RL safety control, offering a theoretically interpretable, lightweight, and general-purpose safety paradigm for implicit safety constraints.
📝 Abstract
Model-free reinforcement learning approaches are promising for control but typically lack formal safety guarantees. Existing methods to shield or otherwise provide these guarantees often rely on detailed knowledge of the safety specifications. Instead, this work's insight is that many difficult-to-specify safety issues are best characterized by invariance. Accordingly, we propose to leverage reversibility as a method for preventing these safety issues throughout the training process. Our method uses model-predictive path integral control to check the safety of an action proposed by a learned policy throughout training. A key advantage of this approach is that it only requires the ability to query the black-box dynamics, not explicit knowledge of the dynamics or safety constraints. Experimental results demonstrate that the proposed algorithm successfully aborts before all unsafe actions, while still achieving comparable training progress to a baseline PPO approach that is allowed to violate safety.