🤖 AI Summary
In deep learning, quadratic approximations—such as Hessian estimates and Laplace approximations—constructed from mini-batches suffer from systematic bias, degrading both the convergence of second-order optimization and the reliability of uncertainty quantification. This work is the first to systematically identify the root cause: a statistical mismatch between the mini-batch gradient covariance and the true Hessian. Leveraging random matrix theory, we derive an analytical bias model characterizing this discrepancy. Building on this insight, we propose the first provably unbiased mini-batch quadratic estimation framework, which jointly corrects second-order optimization directions and posterior uncertainty estimates. Experiments across multiple deep neural networks demonstrate that our method reduces Hessian approximation bias by over 70%, significantly accelerates second-order optimization convergence, and improves predictive confidence calibration—reducing Expected Calibration Error (ECE) by up to 42%.
📝 Abstract
Quadratic approximations form a fundamental building block of machine learning methods. E.g., second-order optimizers try to find the Newton step into the minimum of a local quadratic proxy to the objective function; and the second-order approximation of a network's loss function can be used to quantify the uncertainty of its outputs via the Laplace approximation. When computations on the entire training set are intractable - typical for deep learning - the relevant quantities are computed on mini-batches. This, however, distorts and biases the shape of the associated stochastic quadratic approximations in an intricate way with detrimental effects on applications. In this paper, we (i) show that this bias introduces a systematic error, (ii) provide a theoretical explanation for it, (iii) explain its relevance for second-order optimization and uncertainty quantification via the Laplace approximation in deep learning, and (iv) develop and evaluate debiasing strategies.