SVRG and Beyond via Posterior Correction

📅 2025-12-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the slow convergence and limited efficacy of SVRG-type methods in deep learning, this paper establishes, for the first time, a theoretical connection between SVRG and Bayesian posterior calibration, proposing a posterior calibration framework grounded in the Gaussian exponential family. Building on this foundation, we design two novel SVRG variants: (1) a Newton-type algorithm incorporating Hessian approximation, and (2) an adaptive extension inspired by Adam, specifically tailored for Transformer pretraining and fine-tuning. Our approach unifies stochastic variance reduction, variational inference, and exponential-family modeling, ensuring both theoretical rigor and practical deployability. Experiments on Transformer-based language models demonstrate substantial improvements in convergence speed and training efficiency, empirically validating the effectiveness of the posterior calibration paradigm for deep optimization.

Technology Category

Application Category

📝 Abstract
Stochastic Variance Reduced Gradient (SVRG) and its variants aim to speed-up training by using gradient corrections, but have seen limited success in deep learning. Here, we show surprising new foundational connections of SVRG to a recently proposed Bayesian method called posterior correction. Specifically, we show that SVRG is recovered as a special case of posterior correction over the isotropic-Gaussian family, while novel extensions are automatically obtained by using more flexible exponential families. We derive two new SVRG variants by using Gaussian families: First, a Newton-like variant that employs novel Hessian corrections, and second, an Adam-like extension that improves pretraining and finetuning of Transformer language models. This is the first work to connect SVRG to Bayes and use it to boost variational training for deep networks.
Problem

Research questions and friction points this paper is trying to address.

Connects SVRG to Bayesian posterior correction method
Derives new SVRG variants using flexible exponential families
Improves pretraining and finetuning of Transformer models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Connects SVRG to Bayesian posterior correction method
Derives Newton-like variant using Hessian corrections
Creates Adam-like extension for Transformer model training
🔎 Similar Papers
No similar papers found.