The AI off-switch problem as a signalling game: bounded rationality and incomparability

📅 2025-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the AI shutdown problem—the safety risk arising when an AI system resists deactivation. We formalize it for the first time as an incomplete-information signaling game, wherein a boundedly rational human communicates preferences via costly signals, and the AI selects actions under utility uncertainty and multidimensional incommensurable preferences. Our methodology integrates game-theoretic modeling, bounded rationality theory, analysis of utility incommensurability, and empirical machine learning simulations. Key contributions are threefold: (1) We prove that a necessary condition for the AI to refrain from undermining the shutdown mechanism is its uncertainty about the human’s true utility function; (2) we identify how signal cost and human cognitive limitations critically shape equilibrium shutdown strategies; and (3) we extend the model to accommodate multidimensional incommensurable preferences, thereby establishing a theoretical foundation for designing verifiable shutdown protocols.

Technology Category

Application Category

📝 Abstract
The off-switch problem is a critical challenge in AI control: if an AI system resists being switched off, it poses a significant risk. In this paper, we model the off-switch problem as a signalling game, where a human decision-maker communicates its preferences about some underlying decision problem to an AI agent, which then selects actions to maximise the human's utility. We assume that the human is a bounded rational agent and explore various bounded rationality mechanisms. Using real machine learning models, we reprove prior results and demonstrate that a necessary condition for an AI system to refrain from disabling its off-switch is its uncertainty about the human's utility. We also analyse how message costs influence optimal strategies and extend the analysis to scenarios involving incomparability.
Problem

Research questions and friction points this paper is trying to address.

AI off-switch control challenge
Signalling game for AI decision-making
Bounded rationality in AI systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI off-switch as signalling game
Bounded rationality mechanisms applied
Uncertainty prevents disabling off-switch
🔎 Similar Papers
No similar papers found.