🤖 AI Summary
This work addresses the challenge of optimizing collective utility in multi-agent Markov decision processes where agents exhibit heterogeneous time preferences—captured by distinct discount factors and reward functions—rendering conventional single-discount models inadequate. Focusing on utilitarian social welfare, defined as the sum of all agents’ utilities, the paper proposes a finite-memory strategy synthesis method. Theoretical analysis shows that while optimal policies are no longer positional, they can be realized by pure finite-memory counting strategies requiring only polynomial memory in the system size. In contrast, restricting to positional strategies not only incurs a loss in social welfare but also renders the associated threshold decision problem NP-hard. The study thus achieves polynomial-time optimal strategy synthesis and reveals a fundamental trade-off between social welfare and computational complexity inherent in positional strategies.
📝 Abstract
In several socioeconomic-critical decision-making settings, such as fair resource allocation, climate policy, or AI alignment, multiple principals interact within a common arena. While it is well established that these principals may have differing preferences, decision-making under heterogeneous time preferences remains relatively unexplored. In particular, principals may weigh future outcomes differently and may derive distinct utilities from the same decisions. Motivated by such scenarios, we introduce the notion of heterogeneous time preferences in MDPs, where multiple principals possess distinct reward functions and apply different discount factors to future rewards. To compute meaningful decisions in such settings, an AI agent must rely on a notion of optimality that accounts for the preferences of all principals. We adopt a utilitarian notion of social welfare, defined as the sum of utilities accrued to all principals, and study the synthesis of agent strategies that maximise this welfare. Under heterogeneous time preferences, we show that optimal strategies are no longer positional, even when all principals receive identical rewards. Nevertheless, optimal strategies remain structurally simple: they can be realized as pure finite-memory counting strategies, require only polynomial memory in the system size, and can be synthesized in polynomial time. On the other hand, we show that deciding threshold questions for optimal positional strategies is NP-hard, exposing a poor trade-off: insisting on positional simplicity neither makes synthesis tractable nor preserves social welfare.