Reciprocal Reward Influence Encourages Cooperation From Self-Interested Agents

📅 2024-06-03
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
How to induce long-term reciprocal cooperation among self-interested agents in sequential social dilemmas under limited learning time and without knowledge of opponents’ strategies? Method: We propose the Reciprocator agent, which responds intrinsically to how an opponent’s actions affect its own returns; we design a reward-based reciprocity mechanism that requires neither opponent policy differentiation nor meta-game modeling; and we employ a Q-value shaping framework integrating counterfactual return estimation with dynamic reward re-scaling. Contribution/Results: This work achieves, for the first time, learning-agnostic and sample-efficient cooperation induction—stably attaining Pareto-optimal cooperation across multiple canonical social dilemmas (e.g., Iterated Prisoner’s Dilemma, Stag Hunt, Chicken), significantly outperforming independent learners and state-of-the-art opponent-shaping approaches.

Technology Category

Application Category

📝 Abstract
Cooperation between self-interested individuals is a widespread phenomenon in the natural world, but remains elusive in interactions between artificially intelligent agents. Instead, naive reinforcement learning algorithms typically converge to Pareto-dominated outcomes in even the simplest of social dilemmas. An emerging literature on opponent shaping has demonstrated the ability to reach prosocial outcomes by influencing the learning of other agents. However, such methods differentiate through the learning step of other agents or optimize for meta-game dynamics, which rely on privileged access to opponents' learning algorithms or exponential sample complexity, respectively. To provide a learning rule-agnostic and sample-efficient alternative, we introduce Reciprocators, reinforcement learning agents which are intrinsically motivated to reciprocate the influence of opponents' actions on their returns. This approach seeks to modify other agents' $Q$-values by increasing their return following beneficial actions (with respect to the Reciprocator) and decreasing it after detrimental actions, guiding them towards mutually beneficial actions without directly differentiating through a model of their policy. We show that Reciprocators can be used to promote cooperation in temporally extended social dilemmas during simultaneous learning. Our code is available at https://github.com/johnlyzhou/reciprocator/.
Problem

Research questions and friction points this paper is trying to address.

Reciprocal Cooperation
Artificial Agents
Limited Learning Time
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Strategy
Reciprocal Agent Model
Efficient Cooperation
🔎 Similar Papers
No similar papers found.
J
John L. Zhou
University of California, Los Angeles
W
Weizhe Hong
University of California, Los Angeles
J
Jonathan C. Kao
University of California, Los Angeles