🤖 AI Summary
This paper addresses the challenge of learning optimal treatment rules in settings where interventions—such as social services or insurance enrollment—are individually self-selected, rendering enforcement infeasible and introducing both positive noncompliance and fairness constraints. We propose a unified policy optimization framework integrating causal identification, robust estimation, and fairness-aware constraints. Methodologically, we introduce the first joint modeling of covariate-conditional exclusion restriction testing, two-stage variance-sensitive policy learning, and demographic-group-level participation-rate balancing constraints—yielding estimators with controllable regret bounds. Evaluated on three real-world applications—SNAP reminder campaigns, health insurance enrollment incentives, and pretrial electronic monitoring—the approach significantly improves participation rates among disadvantaged subpopulations while maintaining globally optimal causal utility, thereby demonstrating both theoretical rigor and practical efficacy.
📝 Abstract
In consequential domains, it is often impossible to compel individuals to take treatment, so that optimal policy rules are merely suggestions in the presence of human non-adherence to treatment recommendations. Under heterogeneity, covariates may predict take-up of treatment and final outcome, but differently. While optimal treatment rules optimize causal outcomes across the population, access parity constraints or other fairness considerations on who receives treatment can be important. For example, in social services, a persistent puzzle is the gap in take-up of beneficial services among those who may benefit from them the most. We study causal identification and robust estimation of optimal treatment rules, including under potential violations of positivity. We consider fairness constraints such as demographic parity in treatment take-up, and other constraints, via constrained optimization. Our framework can be extended to handle algorithmic recommendations under an often-reasonable covariate-conditional exclusion restriction, using our robustness checks for lack of positivity in the recommendation. We develop a two-stage algorithm for solving over parametrized policy classes under general constraints to obtain variance-sensitive regret bounds. We illustrate the methods in three case studies based on data from reminders of SNAP benefits recertification, randomized encouragement to enroll in insurance, and from pretrial supervised release with electronic monitoring. While the specific remedy to inequities in algorithmic allocation is context-specific, it requires studying both take-up of decisions and downstream outcomes of them.