Variable Selection in Convex Piecewise Linear Regression

📅 2024-11-04
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF

career value

205K/year
🤖 AI Summary
This paper addresses high-dimensional variable selection for convex piecewise-linear regression—specifically, max-affine models of the form $x mapsto max_{jin[k]} langle a_j^star, x angle + b_j^star$. We propose Sparse Gradient Descent (Sp-GD), a non-asymptotic, locally convergent algorithm under sub-Gaussian noise and covariates. Our contributions are threefold: (1) the first non-asymptotic convergence analysis of Sp-GD, achieving $varepsilon$-accuracy with sample complexity $Oig(max(varepsilon^{-2}sigma_z^2,1),slog(d/s)ig)$; (2) a novel initialization scheme combining sparse PCA and $r$-covering search, requiring only $Oig(varepsilon^{-2}max(sigma_z^4,sigma_z^2,1),s^2log^4 dig)$ samples to enter the basin of convergence; and (3) exact parameter recovery using merely $O(slog(d/s))$ noiseless samples. Monte Carlo simulations confirm the tightness of our theoretical bounds.

Technology Category

Application Category

📝 Abstract
This paper presents Sparse Gradient Descent as a solution for variable selection in convex piecewise linear regression where the model is given as $mathrm{max}langle a_j^star, x angle + b_j^star$ for $j = 1,dots,k$ where $x in mathbb R^d$ is the covariate vector. Here, ${a_j^star}_{j=1}^k$ and ${b_j^star}_{j=1}^k$ denote the ground-truth weight vectors and intercepts. A non-asymptotic local convergence analysis is provided for Sp-GD under sub-Gaussian noise when the covariate distribution satisfies sub-Gaussianity and anti-concentration property. When the model order and parameters are fixed, Sp-GD provides an $epsilon$-accurate estimate given $mathcal{O}(max(epsilon^{-2}sigma_z^2,1)slog(d/s))$ observations where $sigma_z^2$ denotes the noise variance. This also implies the exact parameter recovery by Sp-GD from $mathcal{O}(slog(d/s))$ noise-free observations. Since optimizing the squared loss for sparse max-affine is non-convex, an initialization scheme is proposed to provide a suitable initial estimate within the basin of attraction for Sp-GD, i.e. sufficiently accurate to invoke the convergence guarantees. The initialization scheme uses sparse principal component analysis to estimate the subspace spanned by ${ a_j^star}_{j=1}^k$ then applies an $r$-covering search to estimate the model parameters. A non-asymptotic analysis is presented for this initialization scheme when the covariates and noise samples follow Gaussian distributions. When the model order and parameters are fixed, this initialization scheme provides an $epsilon$-accurate estimate given $mathcal{O}(epsilon^{-2}max(sigma_z^4,sigma_z^2,1)s^2log^4(d))$ observations. Numerical Monte Carlo results corroborate theoretical findings for Sp-GD and the initialization scheme.
Problem

Research questions and friction points this paper is trying to address.

Variable selection in convex piecewise linear regression models
Parameter recovery for max-affine functions under noise
Transforming sparse polynomials into max-affine models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Gradient Descent for variable selection
Initialization using sparse principal component analysis
Real Maslov Dequantization transforms polynomials to max-affine models