🤖 AI Summary
This study addresses the problem of hypothesis testing for linear functionals with arbitrary loading vectors in high-dimensional sparse linear regression, where the design covariance is unknown and only an upper bound on sparsity is available. The authors propose a computationally efficient hybrid testing procedure that rigorously controls Type I error while characterizing its power against sparse alternatives. Key contributions include establishing, for the first time, an information-theoretic lower bound adapted to the magnitude structure of arbitrary loading vectors; precisely characterizing the separation rate up to logarithmic factors in the ultra-sparse regime; and matching upper and lower bounds for several classes of loading vectors in the moderately sparse regime. Furthermore, by leveraging low-degree polynomial lower bounds and a polynomial reduction from sparse canonical correlation analysis, the work reveals a potential gap between statistical optimality and computational feasibility.
📝 Abstract
We study the problem of testing $H_0: ξ^\topβ=t_0$ in high-dimensional sparse linear regression with Gaussian random design and unknown design covariance. The loading vector $ξ$ is arbitrary, and the exact sparsity level $k$ is unknown but bounded by a known value $k_u$. Tests are required to control Type I error uniformly over the $k_u$-sparse null, while power is evaluated against $k$-sparse alternatives. We construct a computationally efficient mixed test that gives an upper bound on the adaptive separation distance and establish an information-theoretic lower bound calibrated to the magnitude profile of $ξ$. In the ultra-sparse regime $k_u\lesssim \sqrt n/\log p$, these bounds characterize the adaptive separation rate up to logarithmic factors for arbitrary $ξ$. In the moderately sparse regime $\sqrt n/\log p\ll k_u\lesssim n/\log p$, these bounds match for several classes of loading vectors but may differ in general. In this regime, we further prove a low-degree lower bound that matches the upper bound up to logarithmic factors. This provides evidence that improving on the rate of the mixed test, if statistically possible, may be computationally hard. For flat sparse loadings, we complement this evidence with a polynomial-time reduction from sparse CCA. Finally, we examine how information about the design covariance affects the adaptive separation rate in two settings. Under a sparse signed-spiked covariance model, the information-theoretic lower bound is attainable up to logarithmic factors by a computationally inefficient procedure, while the low-degree lower bound and sparse-CCA reduction continue to apply, providing evidence for a statistical-computational gap. When the design covariance is known and diagonal, the adaptive separation rate takes the same form as in the ultra-sparse regime.