Policy Learning with $alpha$-Expected Welfare

📅 2025-05-01

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This paper addresses fairness and robustness challenges in policy learning by proposing the α-expected welfare maximization (α-EWM) criterion: optimizing the average welfare over the worst-off α-quantile of the post-treatment outcome distribution, thereby enabling a continuous trade-off between expected welfare and Rawlsian minimax fairness. We formulate α-EWM as a distributionally robust optimization problem, derive its dual representation, and construct an unbiased estimator. Theoretically, we establish a tight asymptotically optimal regret bound for this estimator and develop an asymptotically efficient statistical inference procedure. Empirical evaluations on synthetic and real-world datasets demonstrate that α-EWM achieves significantly improved robustness and fairness—particularly in small-sample regimes—compared to state-of-the-art baselines.

Technology Category

Application Category

📝 Abstract

This paper proposes an optimal policy that targets the average welfare of the worst-off $alpha$-fraction of the post-treatment outcome distribution. We refer to this policy as the $alpha$-Expected Welfare Maximization ($alpha$-EWM) rule, where $alpha in (0,1]$ denotes the size of the subpopulation of interest. The $alpha$-EWM rule interpolates between the expected welfare ($alpha=1$) and the Rawlsian welfare ($alpha ightarrow 0$). For $alphain (0,1)$, an $alpha$-EWM rule can be interpreted as a distributionally robust EWM rule that allows the target population to have a different distribution than the study population. Using the dual formulation of our $alpha$-expected welfare function, we propose a debiased estimator for the optimal policy and establish its asymptotic upper regret bounds. In addition, we develop asymptotically valid inference for the optimal welfare based on the proposed debiased estimator. We examine the finite sample performance of the debiased estimator and inference via both real and synthetic data.

Problem

Research questions and friction points this paper is trying to address.

Develops optimal policy for worst-off α-fraction welfare

Proposes debiased estimator with asymptotic regret bounds

Validates inference methods via real and synthetic data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Targets worst-off α-fraction welfare

Uses debiased estimator for optimal policy

Develops asymptotically valid inference

🔎 Similar Papers

Efficient Multi-Policy Evaluation for Reinforcement Learning