🤖 AI Summary
This work addresses the challenge of providing provably safe control in settings with unknown or stochastic system dynamics and continuous state-action spaces, where existing methods often fall short. The paper introduces the first framework that integrates conformal prediction with reachability analysis to establish probabilistic safety guarantees. Specifically, conformal prediction is employed to construct valid uncertainty sets for the unknown dynamics, within which reachability analysis is performed to verify safety and guide the training of safe reinforcement learning policies. This approach overcomes the limitations of traditional methods that rely on known or deterministic models, offering theoretically grounded probabilistic safety bounds for nonlinear systems. Evaluated across seven tasks spanning four domains—inverted pendulum, lane keeping, drone control, and safe navigation—the method achieves state-of-the-art provable safety guarantees while maintaining high average returns.
📝 Abstract
Designing provably safe control is a core problem in trustworthy autonomy. However, most prior work in this regard assumes either that the system dynamics are known or deterministic, or that the state and action space are finite, significantly limiting application scope. We address this limitation by developing a probabilistic verification framework for unknown dynamical systems which combines conformal prediction with reachability analysis. In particular, we use conformal prediction to obtain valid uncertainty intervals for the unknown dynamics at each time step, with reachability then verifying whether safety is maintained within the conformal uncertainty bounds. Next, we develop an algorithmic approach for training control policies that optimize nominal reward while also maximizing the planning horizon with sound probabilistic safety guarantees. We evaluate the proposed approach in seven safe control settings spanning four domains -- cartpole, lane following, drone control, and safe navigation -- for both affine and nonlinear safety specifications. Our experiments show that the policies we learn achieve the strongest provable safety guarantees while still maintaining high average reward.