Set-Valued Policy Learning

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work proposes the first set-valued causal treatment strategy learning framework, addressing key limitations of conventional single-intervention recommendations that are sensitive to model specification, estimation uncertainty, and finite-sample variability, and lack quantifiable confidence in their prescriptions. In multi-treatment settings, the framework outputs a set of plausible treatment options, where the cardinality of the set reflects decision ambiguity. By integrating conformal prediction with a learnable deferral mechanism, and leveraging a max-min lower-bound optimization coupled with a noise-label-inspired stochastic injection technique, the method guarantees marginal coverage without imposing structural assumptions on the policy and effectively handles unobserved optimal treatments. Empirical evaluations on both synthetic data and a real-world in vitro fertilization (IVF) application demonstrate that the learned strategies are clinically actionable, robust, and achieve a favorable trade-off between performance and reliability.

📝 Abstract

Conventional treatment policies map patient covariates to a single recommended intervention in order to maximize expected clinical outcomes. Although a rich body of causal inference methods has been developed to estimate such policies, point-valued recommendations can be highly sensitive to estimation uncertainty, model specification, and finite-sample variability, while typically providing little guidance about how confident one should be in the recommended action. In this work, we propose a set-valued policy learning paradigm for the multiple-treatment setting, in which policies output a set of plausible treatments rather than a single recommendation. This formulation enables intrinsic uncertainty quantification, with the size of the predicted set reflecting the degree of decision ambiguity. We extend the learning-to-defer framework to multiple treatments via a novel \textit{greatest Lower Bound} method, and introduce \textit{conformal policy learning}, which bridges the gap between unobserved ground-truth optimal treatments and estimated optimal treatment rules. Drawing on insights from the noisy-label literature, we develop a randomness-injection approach that guarantees marginal coverage without requiring assumptions on underlying black-box optimal treatment rules. Through experiments on synthetic data and a real-world application to In-Vitro Fertilization (IVF), we demonstrate that our methods produce robust and actionable policies that naturally incorporate clinical considerations while effectively balancing performance and reliability.

Problem

Research questions and friction points this paper is trying to address.

set-valued policy

treatment recommendation

uncertainty quantification

multiple treatments

decision ambiguity

Innovation

Methods, ideas, or system contributions that make the work stand out.

set-valued policy learning

conformal policy learning

learning-to-defer