🤖 AI Summary
This work addresses the negative externalities arising from individually rational decisions of large language model (LLM) agents in shared environments, which lead to persistent congestion and degraded system performance. To mitigate this, the authors propose the Socially-Weighted Alignment (SWA) framework, which, during inference, interpolates between individual objectives and collective welfare via a social weight parameter λ, thereby encouraging agents to voluntarily suppress demand under overload conditions. SWA represents the first integration of social preference mechanisms from game theory into multi-agent LLM reasoning, inducing a system-level phase transition without requiring parameter updates or reinforcement learning. Theoretical analysis yields a critical threshold λ* = (n − β)/(n − 1), and simulations confirm that when λ exceeds λ*, the system transitions from a congested state to stable operation at capacity, substantially improving overall efficiency.
📝 Abstract
Deploying large language model (LLM) agents in shared environments introduces a fundamental tension between individual alignment and collective stability: locally rational decisions can impose negative externalities that degrade system-level performance. We propose Socially-Weighted Alignment (SWA), a game-theoretic framework that modifies inference-time decision making by interpolating between an agent's private objective and an estimate of group welfare via a social weight $\lambda\in[0,1]$. In a shared-resource congestion game with $n$ agents and congestion severity $\beta$, we show that SWA induces a critical threshold $\lambda^*=(n-\beta)/(n-1)$ above which agents no longer have marginal incentive to increase demand under overload, yielding a phase transition from persistent congestion to stable operation near capacity. We further provide an inference-time algorithmic instantiation of SWA that does not require parameter updates or multi-agent reinforcement learning, and use a multi-agent simulation to empirically validate the predicted threshold behavior.