SVRG and Beyond via Posterior Correction

📅 2025-12-01

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

To address the slow convergence and limited efficacy of SVRG-type methods in deep learning, this paper establishes, for the first time, a theoretical connection between SVRG and Bayesian posterior calibration, proposing a posterior calibration framework grounded in the Gaussian exponential family. Building on this foundation, we design two novel SVRG variants: (1) a Newton-type algorithm incorporating Hessian approximation, and (2) an adaptive extension inspired by Adam, specifically tailored for Transformer pretraining and fine-tuning. Our approach unifies stochastic variance reduction, variational inference, and exponential-family modeling, ensuring both theoretical rigor and practical deployability. Experiments on Transformer-based language models demonstrate substantial improvements in convergence speed and training efficiency, empirically validating the effectiveness of the posterior calibration paradigm for deep optimization.

Technology Category

Application Category

📝 Abstract

Stochastic Variance Reduced Gradient (SVRG) and its variants aim to speed-up training by using gradient corrections, but have seen limited success in deep learning. Here, we show surprising new foundational connections of SVRG to a recently proposed Bayesian method called posterior correction. Specifically, we show that SVRG is recovered as a special case of posterior correction over the isotropic-Gaussian family, while novel extensions are automatically obtained by using more flexible exponential families. We derive two new SVRG variants by using Gaussian families: First, a Newton-like variant that employs novel Hessian corrections, and second, an Adam-like extension that improves pretraining and finetuning of Transformer language models. This is the first work to connect SVRG to Bayes and use it to boost variational training for deep networks.

Problem

Research questions and friction points this paper is trying to address.

Connects SVRG to Bayesian posterior correction method

Derives new SVRG variants using flexible exponential families

Improves pretraining and finetuning of Transformer models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Connects SVRG to Bayesian posterior correction method

Derives Newton-like variant using Hessian corrections

Creates Adam-like extension for Transformer model training

🔎 Similar Papers

No similar papers found.