Variable Selection in Convex Piecewise Linear Regression

📅 2024-11-04

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This paper addresses high-dimensional variable selection for convex piecewise-linear regression—specifically, max-affine models of the form $x mapsto max_{jin[k]} langle a_j^star, x angle + b_j^star$. We propose Sparse Gradient Descent (Sp-GD), a non-asymptotic, locally convergent algorithm under sub-Gaussian noise and covariates. Our contributions are threefold: (1) the first non-asymptotic convergence analysis of Sp-GD, achieving $varepsilon$-accuracy with sample complexity $Oig(max(varepsilon^{-2}sigma_z^2,1),slog(d/s)ig)$; (2) a novel initialization scheme combining sparse PCA and $r$-covering search, requiring only $Oig(varepsilon^{-2}max(sigma_z^4,sigma_z^2,1),s^2log^4 dig)$ samples to enter the basin of convergence; and (3) exact parameter recovery using merely $O(slog(d/s))$ noiseless samples. Monte Carlo simulations confirm the tightness of our theoretical bounds.

Technology Category

Application Category

📝 Abstract

This paper presents Sparse Gradient Descent as a solution for variable selection in convex piecewise linear regression where the model is given as $mathrm{max}langle a_j^star, x angle + b_j^star$ for $j = 1,dots,k$ where $x in mathbb R^d$ is the covariate vector. Here, ${a_j^star}_{j=1}^k$ and ${b_j^star}_{j=1}^k$ denote the ground-truth weight vectors and intercepts. A non-asymptotic local convergence analysis is provided for Sp-GD under sub-Gaussian noise when the covariate distribution satisfies sub-Gaussianity and anti-concentration property. When the model order and parameters are fixed, Sp-GD provides an $epsilon$-accurate estimate given $mathcal{O}(max(epsilon^{-2}sigma_z^2,1)slog(d/s))$ observations where $sigma_z^2$ denotes the noise variance. This also implies the exact parameter recovery by Sp-GD from $mathcal{O}(slog(d/s))$ noise-free observations. Since optimizing the squared loss for sparse max-affine is non-convex, an initialization scheme is proposed to provide a suitable initial estimate within the basin of attraction for Sp-GD, i.e. sufficiently accurate to invoke the convergence guarantees. The initialization scheme uses sparse principal component analysis to estimate the subspace spanned by ${ a_j^star}_{j=1}^k$ then applies an $r$-covering search to estimate the model parameters. A non-asymptotic analysis is presented for this initialization scheme when the covariates and noise samples follow Gaussian distributions. When the model order and parameters are fixed, this initialization scheme provides an $epsilon$-accurate estimate given $mathcal{O}(epsilon^{-2}max(sigma_z^4,sigma_z^2,1)s^2log^4(d))$ observations. Numerical Monte Carlo results corroborate theoretical findings for Sp-GD and the initialization scheme.

Problem

Research questions and friction points this paper is trying to address.

Variable selection in convex piecewise linear regression models

Parameter recovery for max-affine functions under noise

Transforming sparse polynomials into max-affine models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Gradient Descent for variable selection

Initialization using sparse principal component analysis

Real Maslov Dequantization transforms polynomials to max-affine models

🔎 Similar Papers

Unsupervised Machine Learning Hybrid Approach Integrating Linear Programming in Loss Function: A Robust Optimization Technique