🤖 AI Summary
This work addresses the efficient and robust approximation of the Hessian (or its inverse) in stochastic optimization, unifying the formulation across Euclidean space, the symmetric positive-definite (SPD) manifold, and general Lie groups. We establish, for the first time, that under mild conditions the Hessian-fitting objective on a Lie group is strongly convex—transforming the original ill-conditioned problem into a well-conditioned Lie-group optimization task. Leveraging this property, we propose an efficient preconditioner construction method based on Hessian-vector products and sparse structural priors. We further conduct a systematic geometric analysis comparing the adaptability and convergence behavior of second-order and adaptive methods—including BFGS, natural gradient, and PSGD—within this unified framework. Theoretical analysis and empirical evaluation demonstrate that our approach significantly improves the efficiency, numerical stability, and scalability of second-order information utilization in large-scale stochastic optimization.
📝 Abstract
This report studies the fitting of Hessian or its inverse for stochastic optimizations using a Hessian fitting criterion from the preconditioned stochastic gradient descent (PSGD) method, which is intimately related to many commonly used second-order and adaptive gradient optimizers, e.g., BFGS, Gaussian-Newton algorithm, natural gradient descent, AdaGrad, etc. Our analyses reveal the efficiency and reliability differences among a wide range of preconditioner fitting methods, from closed-form to iterative solutions, using Hessian-vector products or stochastic gradients only, with Hessian fittings in the Euclidean space, the manifold of symmetric positive definite (SPL) matrices, to a variety of Lie groups. The most intriguing discovery is that the Hessian fitting itself as an optimization problem is strongly convex under mild conditions in certain general Lie groups. This discovery turns Hessian fitting into a well-behaved Lie group optimization problem and facilitates the designs of highly efficient and elegant Lie group sparse preconditioner fitting methods for large-scale stochastic optimizations.