On Adaptivity in Zeroth-Order Optimization

📅 2026-05-05

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the high memory overhead and unclear convergence benefits of adaptive methods in zeroth-order (ZO) optimization for memory-constrained fine-tuning of large language models. The authors identify that high-dimensional ZO gradients exhibit little coordinate-wise heterogeneity, rendering conventional adaptive strategies inefficient. To overcome this, they propose MEAZO, a novel approach that achieves global stepsize adaptation using only a single scalar. MEAZO uniquely combines the optimization performance of ZO-Adam with the memory efficiency of ZO-SGD. Extensive experiments across multiple large language models and tasks demonstrate that MEAZO matches the accuracy of ZO-Adam while maintaining memory consumption close to that of ZO-SGD, and further exhibits superior robustness to stepsize selection.

📝 Abstract

We investigate the effectiveness of adaptive zeroth-order (ZO) optimization for memory-constrained fine-tuning of large language models (LLMs). Contrary to prior claims, we show that adaptive ZO methods such as ZO-Adam offer no convergence advantage over well-tuned ZO-SGD, while incurring significant memory overhead. Our analysis reveals that in high dimensions, ZO gradients lack coordinate-wise heterogeneity, rendering adaptive mechanisms memory inefficient. Leveraging this insight, we propose MEAZO, a memory-efficient adaptive ZO optimizer that tracks only a single scalar for global step size adaptation. We support our method with theoretical convergence guarantees under standard assumptions. Experiments across multiple LLM families and tasks demonstrate that MEAZO matches ZO-Adam's performance with the memory footprint of ZO-SGD. Additional experiments on synthetic quadratic problems and LLM fine-tuning further demonstrate MEAZO's enhanced robustness to step size choices, particularly in grouped or block-structured optimization settings.

Problem

Research questions and friction points this paper is trying to address.

zeroth-order optimization

adaptive methods

memory efficiency

large language models

fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

zeroth-order optimization

memory efficiency

adaptive optimization

large language models

MEAZO

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Research Engineer, Monetization AI