🤖 AI Summary
This work addresses the modeling of incentives in agent decision-making, systematically distinguishing *response incentives* (how environmental variables affect the optimal policy), *instrumental control incentives* (whether the agent actively manipulates environment variables, e.g., user preferences), and *influence incentives* (variables the agent alters—intentionally or unintentionally). We propose the Structural Causal Influence Model (SCIM), the first framework unifying influence diagrams with structural causal models. Building on causal reasoning, graph theory, and formal modeling, we derive the first decidable graphical criteria for identifying and classifying all three incentive types in single-decision settings. Our approach enables precise, theoretically grounded incentive attribution, significantly enhancing the interpretability and controllability of agent behavior. Empirically, it improves predictive accuracy of behavioral tendencies in fairness-critical and AI safety–sensitive applications, supporting more robust and transparent autonomous decision-making.
📝 Abstract
Which variables does an agent have an incentive to control with its decision, and which variables does it have an incentive to respond to? We formalise these incentives, and demonstrate unique graphical criteria for detecting them in any single decision causal influence diagram. To this end, we introduce structural causal influence models, a hybrid of the influence diagram and structural causal model frameworks. Finally, we illustrate how these incentives predict agent incentives in both fairness and AI safety applications.