🤖 AI Summary
This paper addresses high-dimensional variable selection for convex piecewise-linear regression—specifically, max-affine models of the form $x mapsto max_{jin[k]} langle a_j^star, x
angle + b_j^star$. We propose Sparse Gradient Descent (Sp-GD), a non-asymptotic, locally convergent algorithm under sub-Gaussian noise and covariates. Our contributions are threefold: (1) the first non-asymptotic convergence analysis of Sp-GD, achieving $varepsilon$-accuracy with sample complexity $Oig(max(varepsilon^{-2}sigma_z^2,1),slog(d/s)ig)$; (2) a novel initialization scheme combining sparse PCA and $r$-covering search, requiring only $Oig(varepsilon^{-2}max(sigma_z^4,sigma_z^2,1),s^2log^4 dig)$ samples to enter the basin of convergence; and (3) exact parameter recovery using merely $O(slog(d/s))$ noiseless samples. Monte Carlo simulations confirm the tightness of our theoretical bounds.
📝 Abstract
This paper presents Sparse Gradient Descent as a solution for variable selection in convex piecewise linear regression where the model is given as $mathrm{max}langle a_j^star, x
angle + b_j^star$ for $j = 1,dots,k$ where $x in mathbb R^d$ is the covariate vector. Here, ${a_j^star}_{j=1}^k$ and ${b_j^star}_{j=1}^k$ denote the ground-truth weight vectors and intercepts. A non-asymptotic local convergence analysis is provided for Sp-GD under sub-Gaussian noise when the covariate distribution satisfies sub-Gaussianity and anti-concentration property. When the model order and parameters are fixed, Sp-GD provides an $epsilon$-accurate estimate given $mathcal{O}(max(epsilon^{-2}sigma_z^2,1)slog(d/s))$ observations where $sigma_z^2$ denotes the noise variance. This also implies the exact parameter recovery by Sp-GD from $mathcal{O}(slog(d/s))$ noise-free observations. Since optimizing the squared loss for sparse max-affine is non-convex, an initialization scheme is proposed to provide a suitable initial estimate within the basin of attraction for Sp-GD, i.e. sufficiently accurate to invoke the convergence guarantees. The initialization scheme uses sparse principal component analysis to estimate the subspace spanned by ${ a_j^star}_{j=1}^k$ then applies an $r$-covering search to estimate the model parameters. A non-asymptotic analysis is presented for this initialization scheme when the covariates and noise samples follow Gaussian distributions. When the model order and parameters are fixed, this initialization scheme provides an $epsilon$-accurate estimate given $mathcal{O}(epsilon^{-2}max(sigma_z^4,sigma_z^2,1)s^2log^4(d))$ observations. Numerical Monte Carlo results corroborate theoretical findings for Sp-GD and the initialization scheme.