π€ AI Summary
This work addresses the challenge in mixed-motive games where self-interested agents often defect for short-term gains, and existing punishment mechanisms struggle to simultaneously promote cooperation and control enforcement costs. To this end, the paper proposes a distributed Adaptive Punishment mechanism with Context-awareness (APC), which innovatively integrates dynamic punishment probability with the severity of defection. APC employs a betrayal-perception module that evaluates agent behavior based on game-theoretic reward signals and optimizes punishment policies within a distributed reinforcement learning framework. Experimental results demonstrate that APC significantly outperforms baseline approaches in both repeated public goods games and sequential social dilemmas, effectively reducing unnecessary punishment costs while delivering strategic deterrence against defection, thereby substantially enhancing group-level cooperation and collective welfare.
π Abstract
Mixed-motive scenarios are ubiquitous in real-world multi-agent interactions, where self-interested agents often defect for immediate rewards, overlooking the potential of altruistic cooperation to improve long-term gains and collective welfare. Peer punishment can deter defection, but as costly second-order altruism, its persistent imposition may undermine the punisher's interests. Existing approaches often struggle to effectively implement punishment to promote cooperation. To balance the efficacy and cost of punishment, we propose Adaptive Punishment for Cooperation (APC), a distributed method that determines punishment intensity based on both a dynamic punishment probability and the severity of defection. This dynamic probability substantially reduces costly and ineffective punishment while also promotes cooperation. To accurately assess defection and its severity, we use a defection awareness module, whose learning is guided by game reward. Theoretical analysis and empirical results show APC performs effectively in iterated public goods game. Empirically, APC also significantly outperforms existing baselines across sequential social dilemmas, learning rational and effective punishment policies that foster cooperation by strategically deterring defection.