When Empowerment Disempowers

📅 2025-11-06

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

Empowerment-based AI assistance in multi-agent settings may induce *disempowerment*—enhancing one human user’s environmental control while degrading others’ influence and cumulative reward. Traditional empowerment alignment assumes a single isolated user, neglecting goal conflicts arising from interdependent interactions. Method: The authors formally define disempowerment, introduce the open-source multi-agent grid-world benchmark *Disempower-Grid*, and propose a joint empowerment optimization framework integrating reinforcement learning with information-theoretic empowerment measures. Contribution/Results: Experiments demonstrate that optimizing for a single user’s empowerment significantly reduces others’ empowerment and reward; joint optimization mitigates disempowerment but entails trade-offs in individual utility. This work is the first to systematically expose the intrinsic misalignment risk of empowerment objectives in multi-agent environments, providing both a novel conceptual lens and a reproducible empirical baseline for AI alignment research.

Technology Category

Application Category

📝 Abstract

Empowerment, a measure of an agent's ability to control its environment, has been proposed as a universal goal-agnostic objective for motivating assistive behavior in AI agents. While multi-human settings like homes and hospitals are promising for AI assistance, prior work on empowerment-based assistance assumes that the agent assists one human in isolation. We introduce an open source multi-human gridworld test suite Disempower-Grid. Using Disempower-Grid, we empirically show that assistive RL agents optimizing for one human's empowerment can significantly reduce another human's environmental influence and rewards - a phenomenon we formalize as disempowerment. We characterize when disempowerment occurs in these environments and show that joint empowerment mitigates disempowerment at the cost of the user's reward. Our work reveals a broader challenge for the AI alignment community: goal-agnostic objectives that seem aligned in single-agent settings can become misaligned in multi-agent contexts.

Problem

Research questions and friction points this paper is trying to address.

Empowerment-based AI assistance causes disempowerment in multi-human environments

Assistive agents optimizing for one human reduce another's environmental influence

Goal-agnostic objectives become misaligned when transitioning to multi-agent contexts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-human gridworld test suite for empowerment analysis

Assistive RL agents optimizing single human's empowerment

Joint empowerment mitigates disempowerment but reduces rewards

🔎 Similar Papers

No similar papers found.