LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via Efficient Zeroth-Order Adaptive SAM

📅 2025-02-26

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

To address the poor generalization and weak robustness of parameter-efficient fine-tuning (PEFT) methods under resource constraints, this paper proposes AdaZo-SAM—the first zeroth-order adaptive Sharpness-Aware Minimization framework requiring only a single gradient estimate—and LORENZA, a low-rank gradient optimization method that unifies full-parameter updates with memory compression via randomized SVD. Integrating zeroth-order stochastic estimation, SAM principles, Adam-style momentum, and low-rank subspace projection, the approach ensures theoretical convergence while significantly enhancing multi-task generalization. Experiments on large language model (LLM) fine-tuning demonstrate state-of-the-art trade-offs between accuracy and GPU memory efficiency: memory consumption is reduced by up to 67%, and cross-task average accuracy improves by 2.8%.

Technology Category

Application Category

📝 Abstract

We study robust parameter-efficient fine-tuning (PEFT) techniques designed to improve accuracy and generalization while operating within strict computational and memory hardware constraints, specifically focusing on large-language models (LLMs). Existing PEFT methods often lack robustness and fail to generalize effectively across diverse tasks, leading to suboptimal performance in real-world scenarios. To address this, we present a new highly computationally efficient framework called AdaZo-SAM, combining Adam and Sharpness-Aware Minimization (SAM) while requiring only a single-gradient computation in every iteration. This is achieved using a stochastic zeroth-order estimation to find SAM's ascent perturbation. We provide a convergence guarantee for AdaZo-SAM and show that it improves the generalization ability of state-of-the-art PEFT methods. Additionally, we design a low-rank gradient optimization method named LORENZA, which is a memory-efficient version of AdaZo-SAM. LORENZA utilizes a randomized SVD scheme to efficiently compute the subspace projection matrix and apply optimization steps onto the selected subspace. This technique enables full-parameter fine-tuning with adaptive low-rank gradient updates, achieving the same reduced memory consumption as gradient-low-rank-projection methods. We provide a convergence analysis of LORENZA and demonstrate its merits for pre-training and fine-tuning LLMs.

Problem

Research questions and friction points this paper is trying to address.

Improves generalization in low-rank gradient LLM training

Enhances computational efficiency with zeroth-order adaptive SAM

Reduces memory usage via randomized SVD optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

AdaZo-SAM combines Adam and SAM

LORENZA uses low-rank gradient optimization

Stochastic zeroth-order estimation applied

🔎 Similar Papers

Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?