SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation

📅 2025-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing approximate second-order optimization methods often converge to sharp minima, resulting in inferior generalization compared to SGD. This paper proposes SASSHA, a novel second-order optimizer that jointly incorporates sharpness-aware optimization and numerically stable Hessian approximation. First, it establishes the first deep integration of sharpness-aware optimization with adaptive, numerically stable Hessian approximation. Second, it introduces a lazy-update mechanism and adaptive step-size scheduling to explicitly minimize solution sharpness while preserving computational efficiency. Third, extensive experiments across diverse deep learning tasks demonstrate that SASSHA consistently outperforms state-of-the-art first-order (e.g., SGD, Adam) and second-order methods—achieving superior generalization, enhanced training stability, and controllable computational overhead.

Technology Category

Application Category

📝 Abstract
Approximate second-order optimization methods often exhibit poorer generalization compared to first-order approaches. In this work, we look into this issue through the lens of the loss landscape and find that existing second-order methods tend to converge to sharper minima compared to SGD. In response, we propose Sassha, a novel second-order method designed to enhance generalization by explicitly reducing sharpness of the solution, while stabilizing the computation of approximate Hessians along the optimization trajectory. In fact, this sharpness minimization scheme is crafted also to accommodate lazy Hessian updates, so as to secure efficiency besides flatness. To validate its effectiveness, we conduct a wide range of standard deep learning experiments where Sassha demonstrates its outstanding generalization performance that is comparable to, and mostly better than, other methods. We provide a comprehensive set of analyses including convergence, robustness, stability, efficiency, and cost.
Problem

Research questions and friction points this paper is trying to address.

Improves second-order optimization generalization
Reduces solution sharpness explicitly
Stabilizes approximate Hessian computations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sharpness-aware second-order optimization
Stable Hessian approximation technique
Lazy Hessian updates for efficiency
🔎 Similar Papers
No similar papers found.
D
Dahun Shin
POSTECH
D
Dongyeop Lee
POSTECH
J
Jinseok Chung
POSTECH
Namhoon Lee
Namhoon Lee
POSTECH
machine learning