Post-selection inference for penalized M-estimators via score thinning

📅 2026-01-20

📈 Citations: 0

✨ Influential: 0

career value

256K/year

🤖 AI Summary

This study addresses the challenge of conducting valid statistical inference for M-estimators after model selection with sparsity-inducing penalties such as the Lasso. The authors propose a novel noise-injection approach that decouples model selection from subsequent inference asymptotically: by adding carefully constructed Gaussian noise to the data and leveraging the approximate normality of the score statistic, the method enables valid post-selection inference without requiring customized inferential procedures. Standard regression tools (e.g., glm) suffice for implementation, significantly simplifying the workflow. Theoretical guarantees are established under mild distributional assumptions, and the approach is successfully demonstrated in an empirical study of social networks, where it facilitates inference on the association between gender and smoking behavior after model selection.

Technology Category

Application Category

📝 Abstract

We consider inference for M-estimators after model selection using a sparsity-inducing penalty. While existing methods for this task require bespoke inference procedures, we propose a simpler approach, which relies on two insights: (i) adding and subtracting carefully-constructed noise to a Gaussian random variable with unknown mean and known variance leads to two \emph{independent} Gaussian random variables; and (ii) both the selection event resulting from penalized M-estimation, and the event that a standard (non-selective) confidence interval for an M-estimator covers its target, can be characterized in terms of an approximately normal ``score variable". We combine these insights to show that -- when the noise is chosen carefully -- there is asymptotic independence between the model selected using a noisy penalized M-estimator, and the event that a standard (non-selective) confidence interval on noisy data covers the selected parameter. Therefore, selecting a model via penalized M-estimation (e.g. \verb=glmnet= in \verb=R=) on noisy data, and then conducting \emph{standard} inference on the selected model (e.g. \verb=glm= in \verb=R=) using noisy data, yields valid inference: \emph{no bespoke methods are required}. Our results require independence of the observations, but only weak distributional requirements. We apply the proposed approach to conduct inference on the association between sex and smoking in a social network.

Problem

Research questions and friction points this paper is trying to address.

post-selection inference

penalized M-estimators

sparsity-inducing penalty

model selection

statistical inference

Innovation

Methods, ideas, or system contributions that make the work stand out.

post-selection inference

penalized M-estimators

score thinning

noise augmentation

asymptotic independence

🔎 Similar Papers

Strong screening rules for group-based SLOPE models

2024-05-24arXiv.orgCitations: 1