🤖 AI Summary
This study addresses the challenge of statistical inference in high-dimensional linear models where regression coefficients are collectively dense yet individually weak, rendering conventional sparsity-based methods ineffective. The authors propose a Neighborhood-Localized Nested Regression (NLNR) framework that leverages the sparse conditional dependence structure among covariates. By applying node-wise ℓ₁-penalized regressions, the method estimates a sparse conditional neighborhood for the target variable and constructs a low-dimensional working model to enable coordinate-wise inference. Integrating local regression, thresholding, and boosting strategies, NLNR remains effective even under dense signal settings. Theoretical analysis establishes the consistency and asymptotic normality of the resulting estimators. Both simulation studies and empirical analysis using CCLE data demonstrate superior finite-sample performance compared to existing approaches.
📝 Abstract
High-dimensional inference methods often rely on coefficient sparsity, an assumption that can be restrictive when signals are dense but individually weak. In such settings, valid inference may still be possible if the covariates exhibit sparse conditional dependence. Motivated by this observation, we propose Neighborhood-Localized Nested Regression (NLNR), a framework for coordinatewise inference in high-dimensional linear models with potentially dense coefficients. The central idea is to localize inference for a target coefficient to a low-dimensional working regression determined by a Sparse Conditional Neighborhood (SCN) of the target covariate. Specifically, for a given covariate, we estimate its SCN through nodewise $\ell_1$-penalized regression and then fit a regression using only the target covariate and its estimated neighborhood. Under suitable regularity conditions, we establish consistency and asymptotic normality of the resulting estimator. Building on this inferential reduction principle, we further develop a thresholding-based screening procedure with theoretical guarantees and a boosting variant that augments the working model with additional response-relevant covariates to improve finite-sample performance. Extensive simulations and an application to the CCLE dataset demonstrate favorable empirical performance.