🤖 AI Summary
Zero-shot transfer of multi-agent reinforcement learning (MARL) policies from simulation to real-world autonomous vehicles remains challenging due to simulation-to-reality dynamic discrepancies, state and model uncertainties, and cross-domain safety guarantees. Method: This paper proposes RSR-RSMARL—a framework integrating vehicle-to-vehicle (V2V) communication and shared-state modeling—featuring jointly designed robust policy representations and a modular safety shield based on control barrier functions (CBFs), enabling direct hardware deployment of simulation-trained policies. Contribution/Results: It introduces the first Real-Sim-Real closed-loop adaptation mechanism for MARL. Evaluated on the F1/10 platform, RSR-RSMARL significantly reduces collision rates in multi-vehicle cooperative driving and achieves zero-shot real-world deployment across diverse traffic configurations, demonstrating strong safety, coordination, and generalization capabilities.
📝 Abstract
Deep multi-agent reinforcement learning (MARL) has been demonstrated effectively in simulations for many multi-robot problems. For autonomous vehicles, the development of vehicle-to-vehicle (V2V) communication technologies provide opportunities to further enhance safety of the system. However, zero-shot transfer of simulator-trained MARL policies to hardware dynamic systems remains challenging, and how to leverage communication and shared information for MARL has limited demonstrations on hardware. This problem is challenged by discrepancies between simulated and physical states, system state and model uncertainties, practical shared information design, and the need for safety guarantees in both simulation and hardware. This paper introduces RSR-RSMARL, a novel Robust and Safe MARL framework that supports Real-Sim-Real (RSR) policy adaptation for multi-agent systems with communication among agents, with both simulation and hardware demonstrations. RSR-RSMARL leverages state (includes shared state information among agents) and action representations considering real system complexities for MARL formulation. The MARL policy is trained with robust MARL algorithm to enable zero-shot transfer to hardware considering the sim-to-real gap. A safety shield module using Control Barrier Functions (CBFs) provides safety guarantee for each individual agent. Experiment results on F1/10th-scale autonomous vehicles with V2V communication demonstrate the ability of RSR-RSMARL framework to enhance driving safety and coordination across multiple configurations. These findings emphasize the importance of jointly designing robust policy representations and modular safety architectures to enable scalable, generalizable RSR transfer in multi-agent autonomy.