Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power

๐Ÿ“… 2025-07-31
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper addresses the fundamental tension between AI safety and human welfare by redefining the optimization objective around โ€œhuman powerโ€ โ€” a normative construct capturing agency, autonomy, and collective capability. Method: We propose an axiomatic framework featuring a parameterized, decomposable objective function that explicitly encodes long-horizon considerations, inequality sensitivity, and risk aversion in aggregating human power. The model integrates bounded rationality, social norms, and multi-objective preferences to enable adaptive power balancing under dynamic conditions. Optimization employs backward induction combined with world-model-based multi-agent reinforcement learning for tractable approximation. Contribution/Results: By substituting soft maximization of human power for conventional utility maximization, our approach mitigates instrumental convergence risks. Experiments across canonical scenarios demonstrate that the framework consistently induces AI systems to generate beneficial instrumental subgoals, yielding substantial improvements in system safety and humanโ€“AI collaboration efficacy.

Technology Category

Application Category

๐Ÿ“ Abstract
Power is a key concept in AI safety: power-seeking as an instrumental goal, sudden or gradual disempowerment of humans, power balance in human-AI interaction and international AI governance. At the same time, power as the ability to pursue diverse goals is essential for wellbeing. This paper explores the idea of promoting both safety and wellbeing by forcing AI agents explicitly to empower humans and to manage the power balance between humans and AI agents in a desirable way. Using a principled, partially axiomatic approach, we design a parametrizable and decomposable objective function that represents an inequality- and risk-averse long-term aggregate of human power. It takes into account humans' bounded rationality and social norms, and, crucially, considers a wide variety of possible human goals. We derive algorithms for computing that metric by backward induction or approximating it via a form of multi-agent reinforcement learning from a given world model. We exemplify the consequences of (softly) maximizing this metric in a variety of paradigmatic situations and describe what instrumental sub-goals it will likely imply. Our cautious assessment is that softly maximizing suitable aggregate metrics of human power might constitute a beneficial objective for agentic AI systems that is safer than direct utility-based objectives.
Problem

Research questions and friction points this paper is trying to address.

Designing AI objective function for human empowerment and safety
Balancing power between humans and AI in diverse scenarios
Developing algorithms to compute and optimize human power metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parametrizable objective function for human power
Multi-agent reinforcement learning approximation
Soft maximization of aggregate human power metrics
๐Ÿ”Ž Similar Papers
No similar papers found.