The Intelligent Disobedience Game: Formulating Disobedience in Stackelberg Games and Markov Decision Processes

๐Ÿ“… 2026-03-21
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of enabling agents in shared autonomy systems to make safe and principled decisions that balance compliance with human commands against the necessity of intelligently disobeying them to prevent harm. The authors propose the Intelligent Disobedience Game (IDG), a sequential interaction framework grounded in Stackelberg game theory, wherein the human is modeled as the leader and the agent as a follower operating under information asymmetry. The framework is formalized as a multi-agent Markov decision process to facilitate reinforcement learning-based training. This study presents the first formal model of โ€œintelligent disobedience,โ€ uncovers strategic phenomena such as โ€œsafety traps,โ€ and establishes a computationally tractable theoretical and experimental foundation for developing agents capable of safe non-compliance and for investigating human trust in disobedient AI systems.

Technology Category

Application Category

๐Ÿ“ Abstract
In shared autonomy, a critical tension arises when an automated assistant must choose between obeying a human's instruction and deliberately overriding it to prevent harm. This safety-critical behavior is known as intelligent disobedience. To formalize this dynamic, this paper introduces the Intelligent Disobedience Game (IDG), a sequential game-theoretic framework based on Stackelberg games that models the interaction between a human leader and an assistive follower operating under asymmetric information. It characterizes optimal strategies for both agents across multi-step scenarios, identifying strategic phenomena such as ``safety traps,'' where the system indefinitely avoids harm but fails to achieve the human's goal. The IDG provides a needed mathematical foundation that enables both the algorithmic development of agents that can learn safe non-compliance and the empirical study of how humans perceive and trust disobedient AI. The paper further translates the IDG into a shared control Multi-Agent Markov Decision Process representation, forming a compact computational testbed for training reinforcement learning agents.
Problem

Research questions and friction points this paper is trying to address.

intelligent disobedience
shared autonomy
Stackelberg games
asymmetric information
safety-critical systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Intelligent Disobedience
Stackelberg Game
Shared Autonomy
Markov Decision Process
Safety-Critical AI
๐Ÿ”Ž Similar Papers
No similar papers found.