🤖 AI Summary
In multi-agent reinforcement learning (MARL), a fundamental tension exists between individual self-interest and collective welfare. To address this, we propose Suggestion Sharing (SS), a novel mechanism wherein agents exchange only action suggestions—without sharing rewards, value functions, policy parameters, or sensitive state information. SS establishes the first “pure suggestion exchange” paradigm, and we theoretically prove that it effectively reduces the objective gap between individual and collective optima. Empirically, SS matches or exceeds the performance of mainstream baselines that share rewards, values, or policies on canonical social dilemma benchmarks, while substantially mitigating information leakage risks. It thus achieves a favorable trade-off between cooperative efficiency and privacy preservation. Our core contribution is a lightweight, decentralized, and privacy-preserving pathway to collective optimization, offering a principled alternative to conventional centralized or information-intensive coordination mechanisms.
📝 Abstract
In human society, the conflict between self-interest and collective well-being often obstructs efforts to achieve shared welfare. Related concepts like the Tragedy of the Commons and Social Dilemmas frequently manifest in our daily lives. As artificial agents increasingly serve as autonomous proxies for humans, we propose using multi-agent reinforcement learning (MARL) to address this issue - learning policies to maximise collective returns even when individual agents' interests conflict with the collective one. Traditional MARL solutions involve sharing rewards, values, and policies or designing intrinsic rewards to encourage agents to learn collectively optimal policies. We introduce a novel MARL approach based on Suggestion Sharing (SS), where agents exchange only action suggestions. This method enables effective cooperation without the need to design intrinsic rewards, achieving strong performance while revealing less private information compared to sharing rewards, values, or policies. Our theoretical analysis establishes a bound on the discrepancy between collective and individual objectives, demonstrating how sharing suggestions can align agents' behaviours with the collective objective. Experimental results demonstrate that SS performs competitively with baselines that rely on value or policy sharing or intrinsic rewards.