🤖 AI Summary
Statistical inference for the value function of optimal treatment regimes is challenging due to its inherent non-differentiability. Method: We propose a Softmax-based smoothing estimator to construct valid confidence intervals, circumventing reliance on parametric assumptions, stringent boundary conditions, or high-dimensional kernel density estimation. Under mild regularity conditions, the estimator achieves √n-consistency and asymptotic normality. First-order bias correction and tight control of second-order remainder terms ensure both computational efficiency and statistical robustness, even in confounded causal optimization settings. Contribution/Results: Our key innovation lies in embedding Softmax smoothing within a nonparametric causal inference framework—enabling, for the first time, accurate and generalizable inference for non-differentiable optimal value functions without imposing smoothness assumptions. This substantially enhances the reliability and practical applicability of personalized treatment strategy evaluation.
📝 Abstract
Constructing confidence intervals for the value of an optimal treatment policy is an important problem in causal inference. Insight into the optimal policy value can guide the development of reward-maximizing, individualized treatment regimes. However, because the functional that defines the optimal value is non-differentiable, standard semi-parametric approaches for performing inference fail to be directly applicable. Existing approaches for handling this non-differentiability fall roughly into two camps. In one camp are estimators based on constructing smooth approximations of the optimal value. These approaches are computationally lightweight, but typically place unrealistic parametric assumptions on outcome regressions. In another camp are approaches that directly de-bias the non-smooth objective. These approaches don't place parametric assumptions on nuisance functions, but they either require the computation of intractably-many nuisance estimates, assume unrealistic $L^infty$ nuisance convergence rates, or make strong margin assumptions that prohibit non-response to a treatment. In this paper, we revisit the problem of constructing smooth approximations of non-differentiable functionals. By carefully controlling first-order bias and second-order remainders, we show that a softmax smoothing-based estimator can be used to estimate parameters that are specified as a maximum of scores involving nuisance components. In particular, this includes the value of the optimal treatment policy as a special case. Our estimator obtains $sqrt{n}$ convergence rates, avoids parametric restrictions/unrealistic margin assumptions, and is often statistically efficient.