StochGradAdam: Accelerating Neural Networks Training with Stochastic Gradient Sampling

📅 2023-10-25

🏛️ arXiv.org

📈 Citations: 5

✨ Influential: 0

career value

192K/year

🤖 AI Summary

To address the high computational cost and low efficiency of gradient updates in neural network training, this paper proposes StochGradAdam, a novel optimizer that integrates stochastic gradient sampling into the Adam framework. At each iteration, it computes gradients only over a randomly sampled subset of parameters, while incorporating dynamic weight updates and bias correction to prioritize critical gradient directions. Unlike standard Adam, StochGradAdam preserves adaptive learning rates and convergence stability while enhancing exploration of the loss landscape and mitigating gradient noise. Experimental results on image classification and semantic segmentation tasks demonstrate that StochGradAdam achieves comparable or superior accuracy to Adam with significantly fewer gradient evaluations. This leads to substantial acceleration in large-scale model training without compromising generalization performance, offering improved computational efficiency and robustness.

📝 Abstract

In this paper, we introduce StochGradAdam, a novel optimizer designed as an extension of the Adam algorithm, incorporating stochastic gradient sampling techniques to improve computational efficiency while maintaining robust performance. StochGradAdam optimizes by selectively sampling a subset of gradients during training, reducing the computational cost while preserving the advantages of adaptive learning rates and bias corrections found in Adam. Our experimental results, applied to image classification and segmentation tasks, demonstrate that StochGradAdam can achieve comparable or superior performance to Adam, even when using fewer gradient updates per iteration. By focusing on key gradient updates, StochGradAdam offers stable convergence and enhanced exploration of the loss landscape, while mitigating the impact of noisy gradients. The results suggest that this approach is particularly effective for large-scale models and datasets, providing a promising alternative to traditional optimization techniques for deep learning applications.

Problem

Research questions and friction points this paper is trying to address.

Enhance neural network training efficiency

Reduce computational cost in optimization

Improve performance in large-scale models

Innovation

Methods, ideas, or system contributions that make the work stand out.

StochGradAdam extends Adam with gradient sampling

Selective gradient sampling reduces computational cost

Effective for large-scale models and datasets

🔎 Similar Papers

Multiple importance sampling for stochastic gradient estimation