Policy learning under constraint: Maximizing a primary outcome while controlling an adverse event

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This study addresses the challenge of personalized treatment decision-making in multi-outcome medical settings, where conventional approaches often optimize only primary efficacy while neglecting the risk of adverse events, thereby failing to satisfy safety constraints. To overcome this limitation, the authors propose the PLUC (Policy Learning under Utility Constraints) method, which learns constrained treatment policies from observational data. By integrating the EP-learning framework, PLUC employs an alternating optimization scheme based on a strongly convex Lagrangian to simultaneously maximize primary utility and control the probability of adverse events. The method leverages the Frank-Wolfe algorithm to iteratively optimize over the convex hull of functions and incorporates targeted updates to ensure policy smoothness. Numerical experiments demonstrate that the implemented PLUC-R software package substantially outperforms existing unconstrained methods, achieving both therapeutic efficacy and strict adherence to safety constraints.

Technology Category

Application Category

📝 Abstract

A medical policy aims to support decision-making by mapping patient characteristics to individualized treatment recommendations. Standard approaches typically optimize a single outcome criterion. For example, recommending treatment according to the sign of the Conditional Average Treatment Effect (CATE) maximizes the policy"value"by exploiting treatment effect heterogeneity. This point of view shifts policy learning towards the challenge of learning a reliable CATE estimator. However, in multi-outcome settings, such strategies ignore the risk of adverse events, despite their relevance. PLUC (Policy Learning Under Constraint) addresses this challenges by learning an estimator of the CATE that yields smoothed policies controlling the probability of an adverse event in observational settings. Inspired by insights from EP-learning, PLUC involves the optimization of strongly convex Lagrangian criteria over a convex hull of functions. Its alternating procedure iteratively applies the Frank-Wolfe algorithm to minimize the current criterion, then performs a targeting step that updates the criterion so that its evaluations at previously visited landmarks become targeted estimators of the corresponding theoretical quantities. An R package PLUC-R provides a practical implementation. We illustrate PLUC's performance through a series of numerical experiments.

Problem

Research questions and friction points this paper is trying to address.

policy learning

adverse event

constraint

multi-outcome

CATE

Innovation

Methods, ideas, or system contributions that make the work stand out.

Policy Learning Under Constraint

Conditional Average Treatment Effect

Adverse Event Control