🤖 AI Summary
This work addresses the tension between efficiency and equity in mixed-motive Markov games, where conventional utilitarian objectives often yield socially efficient but unfairly distributed outcomes, thereby undermining individual incentives to cooperate. To tackle this issue, the paper introduces the principle of proportional fairness into multi-agent reinforcement learning for the first time, proposing a fair altruistic utility function defined in the log-return space of individual agents and establishing a Fair Markov Game framework. Through game-theoretic analysis and the design of a Fair Actor-Critic algorithm, the authors derive analytical conditions that promote cooperation. Empirical evaluations across diverse social dilemma environments demonstrate that the proposed approach effectively yields more equitable and stable cooperative strategies, substantially mitigating the payoff imbalance inherent in traditional methods and offering a novel paradigm for balancing fairness and efficiency.
📝 Abstract
Cooperation is fundamental for society's viability, as it enables the emergence of structure within heterogeneous groups that seek collective well-being. However, individuals are inclined to defect in order to benefit from the group's cooperation without contributing the associated costs, thus leading to unfair situations. In game theory, social dilemmas entail this dichotomy between individual interest and collective outcome. The most dominant approach to multi-agent cooperation is the utilitarian welfare which can produce efficient highly inequitable outcomes. This paper proposes a novel framework to foster fairer cooperation by replacing the standard utilitarian objective with Proportional Fairness. We introduce a fair altruistic utility for each agent, defined on the individual log-payoff space and derive the analytical conditions required to ensure cooperation in classic social dilemmas. We then extend this framework to sequential settings by defining a Fair Markov Game and deriving novel fair Actor-Critic algorithms to learn fair policies. Finally, we evaluate our method in various social dilemma environments.