🤖 AI Summary
This work addresses the safety challenges of cascaded dynamical systems in zero-shot reinforcement learning by proposing a safety-aware policy design method based on reduced-order modeling. The approach treats the inner-loop dynamics as a control input to the outer-loop states, enabling the training of a safe reinforcement learning policy on a simplified model while leveraging a low-level tracking controller to achieve zero-shot transfer. Theoretical analysis establishes a probabilistic lower bound on the likelihood that the closed-loop system states remain within a prescribed safe set, quantitatively linking safety guarantees to the tracking performance of the inner-loop controller. Experimental validation on a quadrotor navigation task demonstrates that the proposed method effectively preserves safety, with the achieved safety level critically dependent on the bandwidth and tracking accuracy of the underlying controller.
📝 Abstract
This paper considers the problem of zero-shot safety guarantees for cascade dynamical systems. These are systems where a subset of the states (the inner states) affects the dynamics of the remaining states (the outer states) but not vice-versa. We define safety as remaining on a set deemed safe for all times with high probability. We propose to train a safe RL policy on a reduced-order model, which ignores the dynamics of the inner states, but it treats it as an action that influences the outer state. Thus, reducing the complexity of the training. When deployed in the full system the trained policy is combined with a low-level controller whose task is to track the reference provided by the RL policy. Our main theoretical contribution is a bound on the safe probability in the full-order system. In particular, we establish the interplay between the probability of remaining safe after the zero-shot deployment and the quality of the tracking of the inner states. We validate our theoretical findings on a quadrotor navigation task, demonstrating that the preservation of the safety guarantees is tied to the bandwidth and tracking capabilities of the low-level controller.