Discrete GCBF Proximal Policy Optimization for Multi-agent Safe Optimal Control

📅 2025-02-05

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This paper addresses the distributed safe optimal control problem for multi-agent systems under unknown discrete-time dynamics, partial observability, time-varying communication topologies, and input constraints. We propose a unified framework integrating graph neural network (GNN)-based representation learning, distributed control barrier functions (CBFs), and high-performance policy optimization. Our key innovation is the first end-to-end co-training of discrete-time graph CBFs and policies, enabled by an adaptive safety layer that jointly handles dynamic neighborhoods and input bounds—without requiring any prior high-performance nominal policy. The framework achieves a Pareto-optimal trade-off between safety and performance. In three distinct simulation tasks, our approach attains safety rates comparable to the most conservative baseline while achieving task performance nearly matching the unsafe optimal baseline. Moreover, its hyperparameters demonstrate strong generalization across diverse environments.

Technology Category

Application Category

📝 Abstract

Control policies that can achieve high task performance and satisfy safety constraints are desirable for any system, including multi-agent systems (MAS). One promising technique for ensuring the safety of MAS is distributed control barrier functions (CBF). However, it is difficult to design distributed CBF-based policies for MAS that can tackle unknown discrete-time dynamics, partial observability, changing neighborhoods, and input constraints, especially when a distributed high-performance nominal policy that can achieve the task is unavailable. To tackle these challenges, we propose DGPPO, a new framework that simultaneously learns both a discrete graph CBF which handles neighborhood changes and input constraints, and a distributed high-performance safe policy for MAS with unknown discrete-time dynamics. We empirically validate our claims on a suite of multi-agent tasks spanning three different simulation engines. The results suggest that, compared with existing methods, our DGPPO framework obtains policies that achieve high task performance (matching baselines that ignore the safety constraints), and high safety rates (matching the most conservative baselines), with a constant set of hyperparameters across all environments.

Problem

Research questions and friction points this paper is trying to address.

Ensures safety in multi-agent systems with unknown dynamics.

Handles neighborhood changes and input constraints effectively.

Achieves high task performance while maintaining safety.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Discrete graph CBF handles dynamics

Distributed safe policy for MAS

DGPPO ensures high performance safety

🔎 Similar Papers

Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation