Solving Multi-Agent Safe Optimal Control with Distributed Epigraph Form MARL

📅 2025-04-21

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Achieving globally optimal cost while guaranteeing zero constraint violations in multi-robot cooperative tasks remains a fundamental challenge, where safety must be enforced as an absolute hard constraint. Method: This paper proposes the first multi-agent safe reinforcement learning (MARL) framework incorporating epigraph-form reformulation to rigorously encode safety as hard constraints. We theoretically prove that this reformulation enables distributed optimization. Based on this insight, we design Def-MARL—a novel algorithm integrating constrained Markov decision process (CMDP) modeling, centralized training with decentralized execution (CTDE), and distributed optimization—ensuring policy feasibility during training and strict safety satisfaction during decentralized execution. Results: Evaluated across eight simulated benchmarks, Def-MARL achieves 100% zero constraint violations while significantly outperforming state-of-the-art methods in cost optimality. Physical experiments on a Crazyflie quadcopter swarm demonstrate successful high-precision cooperative transport and dynamic formation navigation, validating both practical applicability and robustness.

Technology Category

Application Category

📝 Abstract

Tasks for multi-robot systems often require the robots to collaborate and complete a team goal while maintaining safety. This problem is usually formalized as a constrained Markov decision process (CMDP), which targets minimizing a global cost and bringing the mean of constraint violation below a user-defined threshold. Inspired by real-world robotic applications, we define safety as zero constraint violation. While many safe multi-agent reinforcement learning (MARL) algorithms have been proposed to solve CMDPs, these algorithms suffer from unstable training in this setting. To tackle this, we use the epigraph form for constrained optimization to improve training stability and prove that the centralized epigraph form problem can be solved in a distributed fashion by each agent. This results in a novel centralized training distributed execution MARL algorithm named Def-MARL. Simulation experiments on 8 different tasks across 2 different simulators show that Def-MARL achieves the best overall performance, satisfies safety constraints, and maintains stable training. Real-world hardware experiments on Crazyflie quadcopters demonstrate the ability of Def-MARL to safely coordinate agents to complete complex collaborative tasks compared to other methods.

Problem

Research questions and friction points this paper is trying to address.

Ensures multi-robot safety with zero constraint violation

Improves training stability via distributed epigraph optimization

Develops Def-MARL for safe collaborative task completion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed epigraph form for stable training

Centralized training distributed execution MARL

Zero constraint violation safety definition

🔎 Similar Papers

Safety-Aware Multi-Agent Learning for Dynamic Network Bridging