Expected Return Symmetries

📅 2025-02-03

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

In multi-agent zero-shot coordination, symmetry breaking due to coordination failure undermines agent interoperability. Method: We propose “expected return symmetry”—a novel, environment-agnostic symmetry structure that generalizes beyond conventional environmental symmetries, requiring neither environmental priors nor ground-truth symmetry annotations. We formally define its group-theoretic structure and prove that environmental symmetry constitutes a subgroup thereof; thus, symmetry learning shifts from environmental modeling to invariance of policy-induced expected returns. Building upon decentralized partially observable Markov decision processes (Dec-POMDPs), we develop a symmetry-constrained policy optimization framework and a zero-shot coordination evaluation protocol. Results: Evaluated on standard multi-agent benchmarks, our approach significantly improves reward consistency and robustness across unseen agent pairings, outperforming baselines leveraging only environmental symmetries.

Technology Category

Application Category

📝 Abstract

Symmetry is an important inductive bias that can improve model robustness and generalization across many deep learning domains. In multi-agent settings, a priori known symmetries have been shown to address a fundamental coordination failure mode known as mutually incompatible symmetry breaking; e.g. in a game where two independent agents can choose to move"left'' or"right'', and where a reward of +1 or -1 is received when the agents choose the same action or different actions, respectively. However, the efficient and automatic discovery of environment symmetries, in particular for decentralized partially observable Markov decision processes, remains an open problem. Furthermore, environmental symmetry breaking constitutes only one type of coordination failure, which motivates the search for a more accessible and broader symmetry class. In this paper, we introduce such a broader group of previously unexplored symmetries, which we call expected return symmetries, which contains environment symmetries as a subgroup. We show that agents trained to be compatible under the group of expected return symmetries achieve better zero-shot coordination results than those using environment symmetries. As an additional benefit, our method makes minimal a priori assumptions about the structure of their environment and does not require access to ground truth symmetries.

Problem

Research questions and friction points this paper is trying to address.

Discovering environment symmetries automatically

Addressing coordination failure in multi-agent settings

Improving zero-shot coordination with expected return symmetries

Innovation

Methods, ideas, or system contributions that make the work stand out.

Expected return symmetries introduced

Minimal a priori environment assumptions

Improved zero-shot coordination results

🔎 Similar Papers

Remove Symmetries to Control Model Expressivity

2024-08-28arXiv.orgCitations: 1

Bosch Group

Renningen, BW, DE

Master Thesis Bridging the Gap between Reinforcement Learning & E2E Driving

Bosch Group

Renningen, BW, DE

AI Research Scientist - FAIR Social Intelligence