A Causal Framework for Evaluating Deferring Systems

📅 2024-05-29
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing evaluations of delay strategies in delayed decision-making systems inadequately assess the causal effect of delay on prediction accuracy, leading to biased estimates of human-in-the-loop policies. Method: We propose the first causal inference framework integrating the potential outcomes model with regression discontinuity design (RDD) to quantify the true impact of “defer-to-human” policies on overall accuracy. Our method supports both fully labeled and human-only labeled settings, enabling estimation of individual- and local-level causal effects. Contribution/Results: We systematically validate our approach using synthetic data and seven real-world benchmark datasets across mainstream delayed ML systems. Compared to conventional evaluation methods, our framework significantly improves causal identification credibility. Empirical results reveal systematic overestimation or underestimation of widely adopted deferral strategies—exposing critical flaws in current practice. This work establishes a new paradigm for reliable causal assessment and policy optimization in delayed ML systems.

Technology Category

Application Category

📝 Abstract
Deferring systems extend supervised Machine Learning (ML) models with the possibility to defer predictions to human experts. However, evaluating the impact of a deferring strategy on system accuracy is still an overlooked area. This paper fills this gap by evaluating deferring systems through a causal lens. We link the potential outcomes framework for causal inference with deferring systems, which allows to identify the causal impact of the deferring strategy on predictive accuracy. We distinguish two scenarios. In the first one, we have access to both the human and ML model predictions for the deferred instances. Here, we can identify the individual causal effects for deferred instances and the aggregates of them. In the second one, only human predictions are available for the deferred instances. Here, we can resort to regression discontinuity designs to estimate a local causal effect. We evaluate our approach on synthetic and real datasets for seven deferring systems from the literature.
Problem

Research questions and friction points this paper is trying to address.

Evaluating impact of deferring strategy on system accuracy
Linking causal inference with deferring systems
Estimating causal effects in two prediction scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal framework for deferring systems evaluation
Potential outcomes linked to deferring strategies
Regression discontinuity for local causal effects
🔎 Similar Papers
No similar papers found.