🤖 AI Summary
This work establishes a two-point deterministic equivalent for analytic functions of random matrices to uniformly characterize the asymptotic generalization performance of high-dimensional linear models—including linear regression, kernel regression, and random feature models—under stochastic gradient descent (SGD) training. To this end, we introduce, for the first time, a deterministic equivalent theory based on the two-point resolvent function, overcoming the limitations of conventional single-point equivalents by integrating random matrix theory, Stieltjes transforms, and SGD dynamical modeling. The framework yields explicit asymptotic expressions for generalization error, recovering existing results while extending to novel settings such as non-isotropic data and general step-size schedules. Theoretical predictions align closely with numerical experiments across diverse model configurations. This provides a unified, principled tool for exact asymptotic analysis of high-dimensional non-convex optimization problems driven by SGD.
📝 Abstract
We derive a novel deterministic equivalence for the two-point function of a random matrix resolvent. Using this result, we give a unified derivation of the performance of a wide variety of high-dimensional linear models trained with stochastic gradient descent. This includes high-dimensional linear regression, kernel regression, and random feature models. Our results include previously known asymptotics as well as novel ones.