Safety Guarantees in Zero-Shot Reinforcement Learning for Cascade Dynamical Systems

📅 2026-04-11

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses the safety challenges of cascaded dynamical systems in zero-shot reinforcement learning by proposing a safety-aware policy design method based on reduced-order modeling. The approach treats the inner-loop dynamics as a control input to the outer-loop states, enabling the training of a safe reinforcement learning policy on a simplified model while leveraging a low-level tracking controller to achieve zero-shot transfer. Theoretical analysis establishes a probabilistic lower bound on the likelihood that the closed-loop system states remain within a prescribed safe set, quantitatively linking safety guarantees to the tracking performance of the inner-loop controller. Experimental validation on a quadrotor navigation task demonstrates that the proposed method effectively preserves safety, with the achieved safety level critically dependent on the bandwidth and tracking accuracy of the underlying controller.

Technology Category

Application Category

📝 Abstract

This paper considers the problem of zero-shot safety guarantees for cascade dynamical systems. These are systems where a subset of the states (the inner states) affects the dynamics of the remaining states (the outer states) but not vice-versa. We define safety as remaining on a set deemed safe for all times with high probability. We propose to train a safe RL policy on a reduced-order model, which ignores the dynamics of the inner states, but it treats it as an action that influences the outer state. Thus, reducing the complexity of the training. When deployed in the full system the trained policy is combined with a low-level controller whose task is to track the reference provided by the RL policy. Our main theoretical contribution is a bound on the safe probability in the full-order system. In particular, we establish the interplay between the probability of remaining safe after the zero-shot deployment and the quality of the tracking of the inner states. We validate our theoretical findings on a quadrotor navigation task, demonstrating that the preservation of the safety guarantees is tied to the bandwidth and tracking capabilities of the low-level controller.

Problem

Research questions and friction points this paper is trying to address.

zero-shot reinforcement learning

safety guarantees

cascade dynamical systems

safe policy

reduced-order model

Innovation

Methods, ideas, or system contributions that make the work stand out.

zero-shot reinforcement learning

cascade dynamical systems

safety guarantees