π€ AI Summary
This work addresses the high memory overhead in large language model (LLM) fine-tuning caused by backpropagation and optimizer states, a challenge that traditional zeroth-order (ZO) optimization methods struggle to overcome due to high gradient estimation variance and strong dependence on parameter dimensionality. To this end, the paper proposes a strategy-driven zeroth-order optimization framework that, for the first time, introduces a learnable perturbation direction sampling strategy into ZO optimization. By adaptively optimizing this sampling strategy, the method effectively reduces the variance of gradient estimates. Theoretical analysis demonstrates that the approach improves the quality of gradient information and weakens the explicit dependence of convergence bounds on parameter dimensionality. Extensive experiments across multiple LLM fine-tuning benchmarks show significant improvements over existing zeroth-order baselines, confirming the methodβs effectiveness and scalability.
π Abstract
Fine-tuning large pretrained language models (LLMs) is a cornerstone of modern NLP, yet its growing memory demands (driven by backpropagation and large optimizer States) limit deployment in resource-constrained settings. Zero-order (ZO) methods bypass backpropagation by estimating directional derivatives from forward evaluations, offering substantial memory savings. However, classical ZO estimators suffer from high variance and an adverse dependence on the parameter dimensionality $d$, which has constrained their use to low-dimensional problems. In this work, we propose a policy-driven ZO framework that treats the sampling distribution over perturbation directions as a learnable policy and updates it to reduce the variance of directional estimates. We develop a practical algorithm implementing this idea and provide a theoretical analysis, showing that learned sampling distributions improve the quality of gradient information and relax the explicit dependence on $d$ in convergence bounds. Empirically, we validate the approach on challenging LLM fine-tuning benchmarks, demonstrating substantially improved performance compared to standard ZO baselines. Our results suggest that adaptive direction sampling is a promising route to make ZO fine-tuning viable at scale. The source code is available at https://github.com/brain-lab-research/zo_ldsd