Towards Fast LLM Fine-tuning through Zeroth-Order Optimization with Projected Gradient-Aligned Perturbations

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

Zeroth-order optimization for fine-tuning large language models (LLMs) suffers from high gradient estimation variance, slow convergence, and substantial computational overhead. To address these challenges, we propose P-GAP (Projected Gradient-Aligned Perturbation), a novel zeroth-order method that constructs a low-dimensional gradient subspace and aligns zeroth-order perturbations within this subspace—thereby enabling efficient projection-based gradient alignment. This design drastically reduces the effective perturbation dimensionality and estimation variance while eliminating redundant computations. Experiments demonstrate that P-GAP improves classification accuracy by up to 6% and generation quality by up to 12% over baseline zeroth-order methods. Moreover, it reduces training iterations by 81% and GPU time by 70%, achieving significant efficiency gains without compromising model performance.

Technology Category

Application Category

📝 Abstract

Fine-tuning large language models (LLMs) using zeroth-order (ZO) optimization has emerged as a promising alternative to traditional gradient-based methods due to its reduced memory footprint requirement. However, existing ZO methods suffer from high variance in gradient estimation, leading to slow convergence and suboptimal performance on large-scale models. In this work, we propose P-GAP, a fast LLM fine-tuning approach through zeroth-order optimization with Projected Gradient-Aligned Perturbations. Specifically, we first estimate a low-dimensional gradient space and then align perturbations in projected gradients' direction within the space. This approach enables reduced the number of perturbed parameters and decreased variance, therefore accelerated convergence for LLM fine-tuning. Experiments on LLMs show that P-GAP consistently surpasses the baselines, achieving up to 6% increase in accuracy on classification tasks and up to 12% higher accuracy on generation tasks, with up to about 81% less training iterations and 70% less GPU hours. These results demonstrate that P-GAP enables fast, scalable, and resource-efficient ZO LLM fine-tuning.

Problem

Research questions and friction points this paper is trying to address.

Reducing gradient estimation variance in zeroth-order LLM fine-tuning

Accelerating convergence speed for large-scale language model optimization

Decreasing computational resource requirements during LLM fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

P-GAP uses zeroth-order optimization for fine-tuning

It aligns perturbations with projected gradient directions

This reduces parameters and variance for faster convergence

🔎 Similar Papers

MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning