Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power

📅 2025-07-31

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

This paper addresses the fundamental tension between AI safety and human welfare by redefining the optimization objective around “human power” — a normative construct capturing agency, autonomy, and collective capability. Method: We propose an axiomatic framework featuring a parameterized, decomposable objective function that explicitly encodes long-horizon considerations, inequality sensitivity, and risk aversion in aggregating human power. The model integrates bounded rationality, social norms, and multi-objective preferences to enable adaptive power balancing under dynamic conditions. Optimization employs backward induction combined with world-model-based multi-agent reinforcement learning for tractable approximation. Contribution/Results: By substituting soft maximization of human power for conventional utility maximization, our approach mitigates instrumental convergence risks. Experiments across canonical scenarios demonstrate that the framework consistently induces AI systems to generate beneficial instrumental subgoals, yielding substantial improvements in system safety and human–AI collaboration efficacy.

Technology Category

Application Category

📝 Abstract

Power is a key concept in AI safety: power-seeking as an instrumental goal, sudden or gradual disempowerment of humans, power balance in human-AI interaction and international AI governance. At the same time, power as the ability to pursue diverse goals is essential for wellbeing. This paper explores the idea of promoting both safety and wellbeing by forcing AI agents explicitly to empower humans and to manage the power balance between humans and AI agents in a desirable way. Using a principled, partially axiomatic approach, we design a parametrizable and decomposable objective function that represents an inequality- and risk-averse long-term aggregate of human power. It takes into account humans' bounded rationality and social norms, and, crucially, considers a wide variety of possible human goals. We derive algorithms for computing that metric by backward induction or approximating it via a form of multi-agent reinforcement learning from a given world model. We exemplify the consequences of (softly) maximizing this metric in a variety of paradigmatic situations and describe what instrumental sub-goals it will likely imply. Our cautious assessment is that softly maximizing suitable aggregate metrics of human power might constitute a beneficial objective for agentic AI systems that is safer than direct utility-based objectives.

Problem

Research questions and friction points this paper is trying to address.

Designing AI objective function for human empowerment and safety

Balancing power between humans and AI in diverse scenarios

Developing algorithms to compute and optimize human power metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parametrizable objective function for human power

Multi-agent reinforcement learning approximation

Soft maximization of aggregate human power metrics

🔎 Similar Papers

Workload Estimation for Unknown Tasks: A Survey of Machine Learning Under Distribution Shift