Torque-Aware Momentum

๐Ÿ“… 2024-12-25
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Classical momentum methods in deep neural network training suffer from oscillatory behavior and inefficient exploration under large-scale, misaligned gradients. To address this, we propose Torque-Aware Momentum (TAM), the first optimizer that dynamically modulates momentum based on the angle between the current gradient and historical momentum vectorโ€”enabling direction-adaptive damping. TAM introduces a geometrically grounded momentum projection correction mechanism, ensuring theoretical stability guarantees while preserving plug-and-play compatibility with both SGD and Adam. Extensive experiments demonstrate that TAM significantly improves generalization performance and robustness to distribution shifts in image classification and large language model fine-tuning tasks. It achieves faster and more stable convergence while effectively suppressing training oscillations.

Technology Category

Application Category

๐Ÿ“ Abstract
Efficiently exploring complex loss landscapes is key to the performance of deep neural networks. While momentum-based optimizers are widely used in state-of-the-art setups, classical momentum can still struggle with large, misaligned gradients, leading to oscillations. To address this, we propose Torque-Aware Momentum (TAM), which introduces a damping factor based on the angle between the new gradients and previous momentum, stabilizing the update direction during training. Empirical results show that TAM, which can be combined with both SGD and Adam, enhances exploration, handles distribution shifts more effectively, and improves generalization performance across various tasks, including image classification and large language model fine-tuning, when compared to classical momentum-based optimizers.
Problem

Research questions and friction points this paper is trying to address.

Deep Neural Networks
Momentum Optimization
Gradient Alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Torque-Aware Momentum
Dynamic Damping Adjustment
Gradient Alignment
๐Ÿ”Ž Similar Papers
No similar papers found.