🤖 AI Summary
This study addresses the persistent welfare loss in cooperative settings involving large language model agents, demonstrating that mechanism design alone cannot fully eliminate inefficiencies—particularly when contracts are inherently incomplete and fail to cover all future contingencies. Integrating insights from incomplete contracting theory and behavioral modeling, this work provides the first theoretical proof that a non-eliminable welfare gap exists under realistic constraints. To bridge this gap, the paper proposes endogenous altruism as a necessary complement: agents are designed to intrinsically value both their own and others’ welfare, thereby compensating for the limitations of formal mechanisms. Multi-agent simulations and social dilemma experiments confirm that this approach significantly enhances both social welfare and individual utility, achieving a Pareto-improving outcome and offering a novel paradigm for cooperative AI.
📝 Abstract
Ensuring that AI agents behave safely and beneficially when interacting with other parties has emerged as one of the central challenges of modern AI safety. While mechanism design, as the theory of designing rules to align individual and collective objectives, can incentivize cooperative behavior, it is still an open question whether it alone is sufficient to maximize LLM agents' social welfare. This work proves that the answer is negative: drawing from incomplete contract theory, we formally show that when contracts cannot distinguish all relevant future contingencies, there is a strictly positive welfare loss that no realistic mechanism can eliminate. We show that prosocial agents, who weigh others' welfare alongside their own, can close this gap and achieve outcomes that are socially superior and individually beneficial. Experimentally, we show that in multi-agent resource-allocation environments and canonical social dilemmas where agents are powered by large language models, prosociality is beneficial. The implication for AI safety is clear: to enable cooperative interactions at scale, designing adequate mechanisms is not sufficient; agents must be built to be intrinsically prosocial.