🤖 AI Summary
This paper addresses optimal subset selection under the Gaussian sequence model: given noisy observations $X_i sim mathcal{N}(mu_i, 1)$, the goal is to select an index set $S$ that maximizes the expected utility $frac{1}{n}sum_{iin S}(mu_i - K_i)$, where $K_i$ are known costs. To overcome bias and inefficiency in conventional utility estimators, we propose ASSURE—a novel, approximately unbiased utility estimator grounded in Stein’s Unbiased Risk Estimate (SURE), integrating regularization and empirical Bayes principles. ASSURE consistently selects the welfare-maximizing decision rule within a prespecified class of rules and achieves asymptotically optimal utility. We establish its minimax-optimal convergence rates under both sparse and dense regimes. Empirical evaluations across applications—including census tract screening, discriminatory firm identification, and A/B testing—demonstrate substantial improvements over standard benchmarks such as the Benjamini–Hochberg procedure and hard-thresholding.
📝 Abstract
This paper proposes methods for producing compound selection decisions in a Gaussian sequence model. Given unknown, fixed parameters $μ_ {1:n}$ and known $σ_{1:n}$ with observations $Y_i sim extsf{N}(μ_i, σ_i^2)$, the decision maker would like to select a subset of indices $S$ so as to maximize utility $frac{1}{n}sum_{iin S} (μ_i - K_i)$, for known costs $K_i$. Inspired by Stein's unbiased risk estimate (SURE), we introduce an almost unbiased estimator, called ASSURE, for the expected utility of a proposed decision rule. ASSURE allows a user to choose a welfare-maximizing rule from a pre-specified class by optimizing the estimated welfare, thereby producing selection decisions that borrow strength across noisy estimates. We show that ASSURE produces decision rules that are asymptotically no worse than the optimal but infeasible decision rule in the pre-specified class. We apply ASSURE to the selection of Census tracts for economic opportunity, the identification of discriminating firms, and the analysis of $p$-value decision procedures in A/B testing.