π€ AI Summary
This work addresses the challenges of identifying relevant variables and suppressing noise in nonlinear feature learning. We propose a coordinate-wise reweighted composite kernel ridge regression framework. Methodologically, we integrate variational analysis with variable selection theory to systematically characterize how different kernels affect feature recoverability: we theoretically prove that ββ-type kernels (e.g., Laplacian) enable exact recovery of nonlinearly relevant features at stationary points, whereas Gaussian kernels only guarantee linear feature recovery. Furthermore, under Gaussian noise, both the global optimum and stationary points exhibit consistent variable screening consistencyβi.e., asymptotically eliminating irrelevant coordinates while correctly identifying truly relevant ones. This work establishes the first unified analytical framework for kernel-based nonlinear feature selection that simultaneously provides rigorous theoretical guarantees and mechanistic interpretation.
π Abstract
We study a compositional variant of kernel ridge regression in which the predictor is applied to a coordinate-wise reweighting of the inputs. Formulated as a variational problem, this model provides a simple testbed for feature learning in compositional architectures. From the perspective of variable selection, we show how relevant variables are recovered while noise variables are eliminated. We establish guarantees showing that both global minimizers and stationary points discard noise coordinates when the noise variables are Gaussian distributed. A central finding is that $ell_1$-type kernels, such as the Laplace kernel, succeed in recovering features contributing to nonlinear effects at stationary points, whereas Gaussian kernels recover only linear ones.