🤖 AI Summary
Existing zeroth-order (ZO) optimizers rely on hand-crafted, static perturbation strategies that poorly adapt to large language model (LLM) architectures, resulting in high memory overhead and limited generalization during fine-tuning. To address this, we propose ZO Fine-tuner—the first learnable ZO framework that integrates meta-learning to automatically discover task-aware optimal perturbation directions via a lightweight neural network, replacing fixed sampling schemes. The framework is trained once and then reused across diverse tasks without retraining, supports mainstream LLM architectures, and incurs minimal deployment overhead. Extensive experiments across four LLMs and seven benchmark datasets demonstrate that ZO Fine-tuner significantly outperforms existing ZO optimizers on 82.1% of task–model combinations. It substantially enhances the adaptability, efficiency, and scalability of zeroth-order fine-tuning while preserving gradient-free operation.
📝 Abstract
Zeroth-order optimizers have recently emerged as a practical approach for fine-tuning large language models (LLMs), significantly reducing GPU memory consumption compared to traditional first-order methods. Yet, existing zeroth-order methods rely on hand-crafted, static sampling strategies that are not adaptable to model-specific structures. To address this, we propose ZO Fine-tuner, a learning-based zeroth-order optimizer for LLMs that automatically learns efficient perturbation strategies through a compact and memory-efficient design. Crucially, our approach is motivated by the observation that only a small number of foundation models and their derivatives are widely adopted in practice. Therefore, learning the optimizer once for a given LLM and reusing it across diverse downstream tasks is both feasible and highly desirable. Accordingly, ZO Fine-tuner is designed to scale learning to learn (L2L) to the foundation-model era by supporting one-time training per LLM with minimal overhead. Experiments on 4 LLMs and 7 datasets show that ZO Fine-tuner outperforms prior zeroth-order baselines in 82.1% of task-model combinations, thereby demonstrating strong performance and scalability for efficient LLM fine-tuning. Our code is available at https://github.com/ASTRAL-Group/ZO_Fine_tuner.git.