🤖 AI Summary
Ensuring safety for safety-critical, non-convex, discrete-time dynamical systems—such as fixed-wing aircraft and autonomous vehicles performing lane merging with adaptive cruise control—remains challenging due to the intractability of enforcing hard safety constraints under non-convex dynamics.
Method: This paper proposes an online-executable discrete-time Control Barrier Function (CBF) coverage mechanism that safely approximates a reinforcement learning (RL) policy without any safety violations.
Contribution/Results: We introduce the first computationally tractable CBF coverage strategy tailored to non-convex discrete-time dynamics, circumventing NP-hard optimization via lightweight non-convex optimization approximations. Evaluated on two real-world non-convex systems, our approach guarantees strict safety (zero safety violations), achieves control performance on par with unconstrained RL baselines, and maintains low computational overhead—enabling real-time deployment.
📝 Abstract
Reinforcement Learning (RL) has enabled vast performance improvements for robotics systems. To achieve these results though, the agent often must randomly explore the environment, which for safety critical systems presents a significant challenge. Barrier functions can solve this challenge by enabling an override that approximates the RL control input as closely as possible without violating a safety constraint. Unfortunately, this override can be computationally intractable in cases where the dynamics are not convex in the control input or when time is discrete, as is often the case when training RL systems. We therefore consider these cases, developing novel barrier functions for two non-convex systems (fixed wing aircraft and self-driving cars performing lane merging with adaptive cruise control) in discrete time. Although solving for an online and optimal override is in general intractable when the dynamics are nonconvex in the control input, we investigate approximate solutions, finding that these approximations enable performance commensurate with baseline RL methods with zero safety violations. In particular, even without attempting to solve for the optimal override at all, performance is still competitive with baseline RL performance. We discuss the tradeoffs of the approximate override solutions including performance and computational tractability.