🤖 AI Summary
This work addresses the lack of safety guarantees in model-free reinforcement learning for black-box dynamical systems by proposing a fully data-driven safe control framework. Integrating Hamilton-Jacobi reachability analysis with model-free learning, the approach leverages contraction theory to design a tailored loss function that jointly trains two neural networks to approximate the non-smooth safe value function and its derivative. This enables, for the first time, a model-free quadratic programming (QP) safety filter that converges to the viscosity solution of the underlying Hamilton-Jacobi equation. The resulting filter effectively prevents unsafe behaviors during early training stages—even in complex scenarios such as hybrid systems—thereby significantly enhancing learning stability and cumulative reward while outperforming strong existing baselines.
📝 Abstract
We introduce Deep QP Safety Filter, a fully data-driven safety layer for black-box dynamical systems. Our method learns a Quadratic-Program (QP) safety filter without model knowledge by combining Hamilton-Jacobi (HJ) reachability with model-free learning. We construct contraction-based losses for both the safety value and its derivatives, and train two neural networks accordingly. In the exact setting, the learned critic converges to the viscosity solution (and its derivative), even for non-smooth values. Across diverse dynamical systems -- even including a hybrid system -- and multiple RL tasks, Deep QP Safety Filter substantially reduces pre-convergence failures while accelerating learning toward higher returns than strong baselines, offering a principled and practical route to safe, model-free control.