Follow the STARs: Dynamic ω-Regular Shielding of Learned Policies

📅 2025-04-11

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This paper addresses the challenge of simultaneously ensuring ω-regular safety (avoiding undesirable events) and liveness (guaranteeing eventual occurrence of desirable events) in learned probabilistic policies during runtime. To this end, we propose STARs, a dynamic runtime shielding framework. STARs is the first to support dynamic post-hoc shielding for the full class of ω-regular properties, leveraging policy templates, ω-automaton construction, game-theoretic model checking, and real-time monitoring with reconfiguration. It enables specification evolution and adaptive intervention under actuator failures. A key innovation is a tunable intervention mechanism that dynamically balances formal assurance strength against task performance. Evaluated on a mobile robot benchmark, STARs demonstrates low-overhead, highly controllable shielding for incrementally updated ω-regular specifications—significantly enhancing both practicality and trustworthiness of learned policies.

Technology Category

Application Category

📝 Abstract

This paper presents a novel dynamic post-shielding framework that enforces the full class of $omega$-regular correctness properties over pre-computed probabilistic policies. This constitutes a paradigm shift from the predominant setting of safety-shielding -- i.e., ensuring that nothing bad ever happens -- to a shielding process that additionally enforces liveness -- i.e., ensures that something good eventually happens. At the core, our method uses Strategy-Template-based Adaptive Runtime Shields (STARs), which leverage permissive strategy templates to enable post-shielding with minimal interference. As its main feature, STARs introduce a mechanism to dynamically control interference, allowing a tunable enforcement parameter to balance formal obligations and task-specific behavior at runtime. This allows to trigger more aggressive enforcement when needed, while allowing for optimized policy choices otherwise. In addition, STARs support runtime adaptation to changing specifications or actuator failures, making them especially suited for cyber-physical applications. We evaluate STARs on a mobile robot benchmark to demonstrate their controllable interference when enforcing (incrementally updated) $omega$-regular correctness properties over learned probabilistic policies.

Problem

Research questions and friction points this paper is trying to address.

Enforcing ω-regular correctness on learned policies

Balancing formal obligations and runtime behavior

Adapting to changing specifications or failures dynamically

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic ω-regular shielding for policy enforcement

Tunable interference balancing formal obligations

Runtime adaptation to specification changes

🔎 Similar Papers

Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding