Policy Learning with Distributional Welfare

📅 2023-11-27
📈 Citations: 3
Influential: 1
📄 PDF
🤖 AI Summary
This paper addresses the vulnerability of conventional utilitarian policy learning—based on the conditional average treatment effect (CATE)—to outliers and its inability to flexibly accommodate policy caution or leniency under heterogeneous individual treatment effects. We propose an optimal intervention allocation framework grounded in the conditional quantile treatment effect (QoTE). To our knowledge, this is the first work to incorporate distributionally robust welfare into policy learning, formulating a minimax strategy based on QoTE that accommodates the fundamental challenge of non-point-identification of the counterfactual joint distribution. Our method integrates causal inference, distributionally robust optimization, and decision theory, supporting both stochastic and deterministic policies under various identification assumptions. We establish an asymptotically tight upper bound on the regret and demonstrate robustness to model misspecification. The framework generalizes to any welfare objective defined as a functional of the potential outcomes’ joint distribution.
📝 Abstract
In this paper, we explore optimal treatment allocation policies that target distributional welfare. Most literature on treatment choice has considered utilitarian welfare based on the conditional average treatment effect (ATE). While average welfare is intuitive, it may yield undesirable allocations especially when individuals are heterogeneous (e.g., with outliers) - the very reason individualized treatments were introduced in the first place. This observation motivates us to propose an optimal policy that allocates the treatment based on the conditional quantile of individual treatment effects (QoTE). Depending on the choice of the quantile probability, this criterion can accommodate a policymaker who is either prudent or negligent. The challenge of identifying the QoTE lies in its requirement for knowledge of the joint distribution of the counterfactual outcomes, which is not generally point-identified. We introduce minimax policies that are robust to this model uncertainty. A range of identifying assumptions can be used to yield more informative policies. For both stochastic and deterministic policies, we establish the asymptotic bound on the regret of implementing the proposed policies. The framework can be generalized to any setting where welfare is defined as a functional of the joint distribution of the potential outcomes.
Problem

Research questions and friction points this paper is trying to address.

Optimal treatment allocation targeting distributional welfare, not just average effects
Identifying policies robust to uncertainty in counterfactual outcome distributions
Generalizing welfare frameworks beyond utilitarian criteria for heterogeneous populations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal policy targets conditional quantile treatment effects
Minimax policies robust to model uncertainty
Asymptotic regret bounds for stochastic policies
🔎 Similar Papers
No similar papers found.
Yifan Cui
Yifan Cui
Zhejiang University
StatisticsInferenceLearning
S
Sukjin Han
School of Economics, University of Bristol