The Intelligent Disobedience Game: Formulating Disobedience in Stackelberg Games and Markov Decision Processes

📅 2026-03-21

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the challenge of enabling agents in shared autonomy systems to make safe and principled decisions that balance compliance with human commands against the necessity of intelligently disobeying them to prevent harm. The authors propose the Intelligent Disobedience Game (IDG), a sequential interaction framework grounded in Stackelberg game theory, wherein the human is modeled as the leader and the agent as a follower operating under information asymmetry. The framework is formalized as a multi-agent Markov decision process to facilitate reinforcement learning-based training. This study presents the first formal model of “intelligent disobedience,” uncovers strategic phenomena such as “safety traps,” and establishes a computationally tractable theoretical and experimental foundation for developing agents capable of safe non-compliance and for investigating human trust in disobedient AI systems.

Technology Category

Application Category

📝 Abstract

In shared autonomy, a critical tension arises when an automated assistant must choose between obeying a human's instruction and deliberately overriding it to prevent harm. This safety-critical behavior is known as intelligent disobedience. To formalize this dynamic, this paper introduces the Intelligent Disobedience Game (IDG), a sequential game-theoretic framework based on Stackelberg games that models the interaction between a human leader and an assistive follower operating under asymmetric information. It characterizes optimal strategies for both agents across multi-step scenarios, identifying strategic phenomena such as ``safety traps,'' where the system indefinitely avoids harm but fails to achieve the human's goal. The IDG provides a needed mathematical foundation that enables both the algorithmic development of agents that can learn safe non-compliance and the empirical study of how humans perceive and trust disobedient AI. The paper further translates the IDG into a shared control Multi-Agent Markov Decision Process representation, forming a compact computational testbed for training reinforcement learning agents.

Problem

Research questions and friction points this paper is trying to address.

intelligent disobedience

shared autonomy

Stackelberg games

asymmetric information

safety-critical systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Intelligent Disobedience

Stackelberg Game

Shared Autonomy