🤖 AI Summary
Addressing the challenge of jointly optimizing safety and performance in large-scale multi-agent systems—where existing MARL, MPC, and safety-filtering approaches lack formal guarantees or suffer from poor scalability—this paper proposes a decentralized physics-informed machine learning framework. Methodologically, it (1) reformulates state-constrained optimal control as an unconstrained optimization via epigraph lifting; (2) integrates Hamilton–Jacobi reachability analysis into a dynamic neighbor selection mechanism to ensure provable safety and local adaptivity; and (3) employs a lightweight physics-informed neural network (PINN) to approximate the value function—trained on a reduced-order system and deployed fully distributively, relying solely on local observations for decision-making. Evaluated on multi-agent navigation tasks, the framework achieves superior safety-performance trade-offs, demonstrates strong scalability to hundreds of agents, and supports real-time execution.
📝 Abstract
Co-optimizing safety and performance in large-scale multi-agent systems remains a fundamental challenge. Existing approaches based on multi-agent reinforcement learning (MARL), safety filtering, or Model Predictive Control (MPC) either lack strict safety guarantees, suffer from conservatism, or fail to scale effectively. We propose MAD-PINN, a decentralized physics-informed machine learning framework for solving the multi-agent state-constrained optimal control problem (MASC-OCP). Our method leverages an epigraph-based reformulation of SC-OCP to simultaneously capture performance and safety, and approximates its solution via a physics-informed neural network. Scalability is achieved by training the SC-OCP value function on reduced-agent systems and deploying them in a decentralized fashion, where each agent relies only on local observations of its neighbours for decision-making. To further enhance safety and efficiency, we introduce an Hamilton-Jacobi (HJ) reachability-based neighbour selection strategy to prioritize safety-critical interactions, and a receding-horizon policy execution scheme that adapts to dynamic interactions while reducing computational burden. Experiments on multi-agent navigation tasks demonstrate that MAD-PINN achieves superior safety-performance trade-offs, maintains scalability as the number of agents grows, and consistently outperforms state-of-the-art baselines.