🤖 AI Summary
Adaptive learning rate (LR) configuration in differential learning rate (DLR) optimization remains challenging due to the lack of principled, task-agnostic guidance. Method: We propose Hessian-guided DLR (Hi-DLR), the first framework to incorporate real-time local Hessian curvature estimation into DLR optimization. Hi-DLR groups parameters, computes dynamic Hessian approximations, quantifies gradient sensitivity per group, and employs a parameter freezing mechanism—enabling online, group-wise LR adaptation and automatic identification and freezing of low-contribution parameters. Contribution/Results: Hi-DLR is model- and task-agnostic, lightweight, and establishes a new general fine-tuning paradigm. Experiments across diverse PEFT methods and full-parameter training tasks demonstrate that Hi-DLR significantly accelerates convergence, achieves state-of-the-art or baseline-comparable performance, and substantially reduces computational overhead.
📝 Abstract
Differential learning rate (DLR), a technique that applies different learning rates to different model parameters, has been widely used in deep learning and achieved empirical success via its various forms. For example, parameter-efficient fine-tuning (PEFT) applies zero learning rates to most parameters so as to significantly save the computational cost. At the core, DLR leverages the observation that different parameters can have different loss curvature, which is hard to characterize in general. We propose the Hessian-informed differential learning rate (Hi-DLR), an efficient approach that solves the hyperparameter optimization (HPO) of learning rates and captures the loss curvature for any model and optimizer adaptively. Given a proper grouping of parameters, we empirically demonstrate that Hi-DLR can improve the convergence by dynamically determining the learning rates during the training. Furthermore, we can quantify the influence of different parameters and freeze the less-contributing parameters, which leads to a new PEFT that automatically adapts to various tasks and models. Additionally, Hi-DLR also exhibits comparable performance on various full model training tasks.