Revisiting Sharpness-Aware Minimization: A More Faithful and Effective Implementation

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance degradation of existing Sharpness-Aware Minimization (SAM) methods in multi-step ascent settings, which stems from coarse gradient approximations and a lack of theoretical grounding. The authors propose eXplicit SAM (XSAM), a novel framework that reinterprets SAM’s core mechanism as approximating the direction toward the point of maximal loss within a local neighborhood. Building on this insight, XSAM introduces an explicit and lightweight optimization strategy that directly estimates the maximal-loss direction and constructs an efficient search space by coherently integrating multi-step ascent gradient information. This unified approach seamlessly accommodates both single-step and multi-step configurations. Experimental results demonstrate that XSAM consistently outperforms existing SAM variants with negligible additional computational overhead, yielding significant improvements in model generalization.

Technology Category

Application Category

📝 Abstract
Sharpness-Aware Minimization (SAM) enhances generalization by minimizing the maximum training loss within a predefined neighborhood around the parameters. However, its practical implementation approximates this as gradient ascent(s) followed by applying the gradient at the ascent point to update the current parameters. This practice can be justified as approximately optimizing the objective by neglecting the (full) derivative of the ascent point with respect to the current parameters. Nevertheless, a direct and intuitive understanding of why using the gradient at the ascent point to update the current parameters works superiorly is still lacking. Our work bridges this gap by proposing a novel and intuitive interpretation. We show that the gradient at the single-step ascent point, \uline{when applied to the current parameters}, provides a better approximation of the direction from the current parameters toward the maximum within the local neighborhood than the local gradient. This improved approximation thereby enables a more direct escape from the maximum within the local neighborhood. Nevertheless, our analysis further reveals two issues. First, the approximation by the gradient at the single-step ascent point is often inaccurate. Second, the approximation quality may degrade as the number of ascent steps increases. To address these limitations, we propose in this paper eXplicit Sharpness-Aware Minimization (XSAM). It tackles the first by explicitly estimating the direction of the maximum during training, while addressing the second by crafting a search space that effectively leverages the gradient information at the multi-step ascent point. XSAM features a unified formulation that applies to both single-step and multi-step settings and only incurs negligible computational overhead. Extensive experiments demonstrate the consistent superiority of XSAM against existing counterparts.
Problem

Research questions and friction points this paper is trying to address.

Sharpness-Aware Minimization
gradient approximation
local neighborhood
generalization
optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sharpness-Aware Minimization
gradient approximation
generalization
optimization
XSAM
J
Jianlong Chen
Key Laboratory of Interdisciplinary Research of Computation and Economics, Shanghai University of Finance and Economics
Zhiming Zhou
Zhiming Zhou
Shanghai University of Finance and Economics
GeneralizationOptimizationGANsMachine LearningComputer Graphics