🤖 AI Summary
This work addresses the limitations of existing fairness-based multi-agent reinforcement learning approaches in asymmetric sequential social dilemmas, where enforcing strict equality often triggers defection and fails to accommodate inherent agent heterogeneity. To overcome this, the paper introduces a novel notion of fairness tailored to asymmetric environments, incorporating a weighting mechanism based on agents’ reward ranges and leveraging local social feedback to eliminate reliance on global information. The proposed method effectively fosters cooperation under partial observability, significantly accelerates the emergence of cooperative strategies in asymmetric settings, and demonstrates superior performance compared to current approaches. Moreover, it exhibits strong scalability and practical applicability, making it well-suited for real-world multi-agent systems characterized by asymmetry and limited observability.
📝 Abstract
Sequential Social Dilemmas (SSDs) provide a key framework for studying how cooperation emerges when individual incentives conflict with collective welfare. In Multi-Agent Reinforcement Learning, these problems are often addressed by incorporating intrinsic drives that encourage prosocial or fair behavior. However, most existing methods assume that agents face identical incentives in the dilemma and require continuous access to global information about other agents to assess fairness. In this work, we introduce asymmetric variants of well-known SSD environments and examine how natural differences between agents influence cooperation dynamics. Our findings reveal that existing fairness-based methods struggle to adapt under asymmetric conditions by enforcing raw equality that wrongfully incentivize defection. To address this, we propose three modifications: (i) redefining fairness by accounting for agents' reward ranges, (ii) introducing an agent-based weighting mechanism to better handle inherent asymmetries, and (iii) localizing social feedback to make the methods effective under partial observability without requiring global information sharing. Experimental results show that in asymmetric scenarios, our method fosters faster emergence of cooperative policies compared to existing approaches, without sacrificing scalability or practicality.