On the Performance Analysis of Momentum Method: A Frequency Domain Perspective

📅 2024-11-29
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The lack of theoretical guidance for momentum coefficient selection hinders deeper understanding of momentum-based optimization. This paper establishes the first time-varying gradient filtering interpretation of momentum optimizers, revealing their dynamic frequency response characteristics in the frequency domain: during early training, gradients are preserved with broad-band response; in later stages, low-frequency components are progressively enhanced while high-frequency noise is suppressed. Building on this insight, we propose the Frequency-domain Self-adaptive Gradient Descent with Momentum (FSGDM), which dynamically schedules the momentum coefficient according to training phase. Extensive experiments across multiple benchmark models and datasets demonstrate that FSGDM significantly improves convergence speed and generalization performance over baseline methods such as SGD with momentum, thereby validating both the efficacy and practicality of the proposed frequency-domain analytical framework.

Technology Category

Application Category

📝 Abstract
Momentum-based optimizers are widely adopted for training neural networks. However, the optimal selection of momentum coefficients remains elusive. This uncertainty impedes a clear understanding of the role of momentum in stochastic gradient methods. In this paper, we present a frequency domain analysis framework that interprets the momentum method as a time-variant filter for gradients, where adjustments to momentum coefficients modify the filter characteristics. Our experiments support this perspective and provide a deeper understanding of the mechanism involved. Moreover, our analysis reveals the following significant findings: high-frequency gradient components are undesired in the late stages of training; preserving the original gradient in the early stages, and gradually amplifying low-frequency gradient components during training both enhance performance. Based on these insights, we propose Frequency Stochastic Gradient Descent with Momentum (FSGDM), a heuristic optimizer that dynamically adjusts the momentum filtering characteristic with an empirically effective dynamic magnitude response. Experimental results demonstrate the superiority of FSGDM over conventional momentum optimizers.
Problem

Research questions and friction points this paper is trying to address.

Analyzes momentum method in frequency domain
Identifies optimal momentum coefficients for training
Proposes FSGDM for dynamic momentum adjustment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Frequency domain analysis
Dynamic momentum adjustment
Frequency Stochastic Gradient Descent
🔎 Similar Papers
No similar papers found.
X
Xianliang Li
Shenzhen Institute of Advanced Technology, University of Chinese Academy of Sciences
J
Jun Luo
Shenzhen Institute of Advanced Technology, University of Chinese Academy of Sciences
Z
Zhiwei Zheng
University of California, Berkeley
Hanxiao Wang
Hanxiao Wang
CASIA
Computer Graphics3D Generation
L
Li Luo
Sun Yat-sen University
L
Lingkun Wen
University of Chinese Academy of Sciences, Shanghai Astronomical Observatory
Linlong Wu
Linlong Wu
University of Luxembourg
S
Sheng Xu
Shenzhen Institute of Advanced Technology