π€ AI Summary
This paper addresses the problem of testing the significance of individual regression coefficients in high-dimensional linear models where $p asymp n$ (e.g., $p = n/2$), aiming for finite-sample Type I error control without strong distributional assumptions on the noiseβsuch as Gaussianity or light tails. To this end, we propose the Residual Permutation Test (RPT): residuals are projected onto the orthogonal complement of the union of column spaces spanned by the original and permuted design matrices, yielding a permutation-invariant test statistic. RPT is the first method achieving exact finite-sample level control under fixed design and exchangeable noise. Theoretically, it is shown to be asymptotically valid under heavy-tailed noise and achieves the minimax optimal signal detection rate. Numerical experiments demonstrate its robustness and superior power over existing approaches under both Gaussian and heavy-tailed noise settings.
π Abstract
We consider the problem of testing whether a single coefficient is equal to zero in linear models when the dimension of covariates $p$ can be up to a constant fraction of sample size $n$. In this regime, an important topic is to propose tests with finite-sample valid size control without requiring the noise to follow strong distributional assumptions. In this paper, we propose a new method, called residual permutation test (RPT), which is constructed by projecting the regression residuals onto the space orthogonal to the union of the column spaces of the original and permuted design matrices. RPT can be proved to achieve finite-population size validity under fixed design with just exchangeable noises, whenever $p<n / 2$. Moreover, RPT is shown to be asymptotically powerful for heavy tailed noises with bounded $(1+t)$-th order moment when the true coefficient is at least of order $n^{-t/(1+t)}$ for $t in [0,1]$. We further proved that this signal size requirement is essentially rate-optimal in the minimax sense. Numerical studies confirm that RPT performs well in a wide range of simulation settings with normal and heavy-tailed noise distributions.