🤖 AI Summary
Prior research lacks a statistically grounded definition of “appropriate human reliance on AI,” leading to conceptual ambiguity and logical inconsistencies. Method: This paper establishes a formal framework rooted in statistical decision theory, defining reliance as the probability that a human adopts an AI’s recommendation—thereby statistically decoupling reliance behavior (action selection) from signal discrimination ability (cognitive performance). Integrating Bayesian inference, rational decision benchmarks, and behavioral experimentation, we propose quantifiable, orthogonal metrics: misreliance loss and signal misclassification loss. Contribution/Results: Applied across multiple AI-augmented decision-making experiments, the framework successfully identifies and quantifies performance gaps in human-AI collaboration. It provides a rigorous theoretical foundation and empirically validated tools for designing, evaluating, and intervening in collaborative AI systems.
📝 Abstract
Humans frequently make decisions with the aid of artificially intelligent (AI) systems. A common pattern is for the AI to recommend an action to the human who retains control over the final decision. Researchers have identified ensuring that a human has appropriate reliance on an AI as a critical component of achieving complementary performance. We argue that the current definition of appropriate reliance used in such research lacks formal statistical grounding and can lead to contradictions. We propose a formal definition of reliance, based on statistical decision theory, which separates the concepts of reliance as the probability the decision-maker follows the AI’s recommendation from challenges a human may face in differentiating the signals and forming accurate beliefs about the situation. Our definition gives rise to a framework that can be used to guide the design and interpretation of studies on human-AI complementarity and reliance. Using recent AI-advised decision making studies from literature, we demonstrate how our framework can be used to separate the loss due to mis-reliance from the loss due to not accurately differentiating the signals. We evaluate these losses by comparing to a baseline and a benchmark for complementary performance defined by the expected payoff achieved by a rational decision-maker facing the same decision task as the behavioral decision-makers.