🤖 AI Summary
Multi-agent reinforcement learning (MARL) suffers from poor generalization, low sample efficiency, and weak scalability—especially in complex dynamical systems lacking intrinsic symmetries. Method: This paper introduces the novel paradigm of *extrinsic symmetry embedding*, transcending conventional equivariant learning that relies solely on inherent system symmetries. We propose the first framework for modeling extrinsic symmetries and design Group Equivariant Graphormer—a unified architecture integrating group-equivariant deep learning, Graphormer-based graph neural networks, and distributed MARL—to achieve group-equivariant policy representations even under symmetry-breaking conditions. Contribution/Results: Evaluated on quadcopter swarm control, our approach achieves zero-shot scalability across swarm sizes, significantly reduces collision rates, improves task success rates, and demonstrates strong robustness across diverse scenarios and varying swarm scales.
📝 Abstract
Multi-agent reinforcement learning has emerged as a powerful framework for enabling agents to learn complex, coordinated behaviors but faces persistent challenges regarding its generalization, scalability and sample efficiency. Recent advancements have sought to alleviate those issues by embedding intrinsic symmetries of the systems in the policy. Yet, most dynamical systems exhibit little to no symmetries to exploit. This paper presents a novel framework for embedding extrinsic symmetries in multi-agent system dynamics that enables the use of symmetry-enhanced methods to address systems with insufficient intrinsic symmetries, expanding the scope of equivariant learning to a wide variety of MARL problems. Central to our framework is the Group Equivariant Graphormer, a group-modular architecture specifically designed for distributed swarming tasks. Extensive experiments on a swarm of symmetry-breaking quadrotors validate the effectiveness of our approach, showcasing its potential for improved generalization and zero-shot scalability. Our method achieves significant reductions in collision rates and enhances task success rates across a diverse range of scenarios and varying swarm sizes.