🤖 AI Summary
This work addresses the challenge of ensuring global safety in partially observable multi-agent systems, where local actions alone are insufficient and existing approaches either rely on global state information or lack historical awareness. The paper proposes the first safety framework for communication-free Dec-POMDPs, formulating global safety specifications via a guarded-choice and recursion-enhanced process algebra, then compiling them into local Mealy machines for each agent. These machines, combined with belief states, yield safe action sets without requiring global information while maintaining memory of past interactions. The entire pipeline is implemented in Rust and leverages PRISM for probabilistic symbolic verification. Empirical evaluation on path-planning tasks demonstrates a significant reduction in collision rates, supports flexible trade-offs between expressiveness and conservativeness, and enables policy-agnostic computation of probabilistic safety bounds.
📝 Abstract
Multi-agent systems under partial observation often struggle to maintain safety because each agent's locally chosen action does not, in general, determine the resulting joint action. Shielding addresses this by filtering actions based on the current state, but most existing techniques either assume access to a shared centralised global state or employ memoryless local filters that cannot consider interaction history.
We introduce a shield process algebra with guarded choice and recursion for specifying safe global behaviour in communication-free Dec-POMDP settings. From a shield process, we compile a process automaton, then a global Mealy machine as a safe joint-action filter, and finally project it to local Mealy machines whose states are belief-style subsets of the global Mealy machine states consistent with each agent's observations, and which output per-agent safe action sets.
We implement the pipeline in Rust and integrate PRISM, the Probabilistic Symbolic Model Checker, to compute best- and worst-case safety probabilities independently of the agents' policies. A multi-agent path-finding case study demonstrates how different shield processes substantially reduce collisions compared to the unshielded baseline while exhibiting varying levels of expressiveness and conservatism.