Learning a Zeroth-Order Optimizer for Fine-Tuning LLMs

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Existing zeroth-order (ZO) optimizers rely on hand-crafted, static perturbation strategies that poorly adapt to large language model (LLM) architectures, resulting in high memory overhead and limited generalization during fine-tuning. To address this, we propose ZO Fine-tuner—the first learnable ZO framework that integrates meta-learning to automatically discover task-aware optimal perturbation directions via a lightweight neural network, replacing fixed sampling schemes. The framework is trained once and then reused across diverse tasks without retraining, supports mainstream LLM architectures, and incurs minimal deployment overhead. Extensive experiments across four LLMs and seven benchmark datasets demonstrate that ZO Fine-tuner significantly outperforms existing ZO optimizers on 82.1% of task–model combinations. It substantially enhances the adaptability, efficiency, and scalability of zeroth-order fine-tuning while preserving gradient-free operation.

Technology Category

Application Category

📝 Abstract

Zeroth-order optimizers have recently emerged as a practical approach for fine-tuning large language models (LLMs), significantly reducing GPU memory consumption compared to traditional first-order methods. Yet, existing zeroth-order methods rely on hand-crafted, static sampling strategies that are not adaptable to model-specific structures. To address this, we propose ZO Fine-tuner, a learning-based zeroth-order optimizer for LLMs that automatically learns efficient perturbation strategies through a compact and memory-efficient design. Crucially, our approach is motivated by the observation that only a small number of foundation models and their derivatives are widely adopted in practice. Therefore, learning the optimizer once for a given LLM and reusing it across diverse downstream tasks is both feasible and highly desirable. Accordingly, ZO Fine-tuner is designed to scale learning to learn (L2L) to the foundation-model era by supporting one-time training per LLM with minimal overhead. Experiments on 4 LLMs and 7 datasets show that ZO Fine-tuner outperforms prior zeroth-order baselines in 82.1% of task-model combinations, thereby demonstrating strong performance and scalability for efficient LLM fine-tuning. Our code is available at https://github.com/ASTRAL-Group/ZO_Fine_tuner.git.

Problem

Research questions and friction points this paper is trying to address.

Learning adaptive perturbation strategies for zeroth-order LLM fine-tuning

Replacing hand-crafted static sampling with learned optimization methods

Enabling one-time optimizer training per foundation model for multiple tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns perturbation strategies automatically for LLMs

Reuses learned optimizer across diverse downstream tasks

Supports one-time training per LLM with minimal overhead

🔎 Similar Papers

MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning