QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models

πŸ“… 2025-02-17
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address numerical instability in backpropagation and accumulated bias from straight-through estimators in low-precision (4-/8-bit) large language model (LLM) fine-tuning, this paper proposes the first quantized zeroth-order optimization framework. Our method entirely eliminates backpropagation, relying solely on quantized forward passes, and introduces an optimized stochastic rounding mechanism to suppress gradient bias under ultra-low-bitwidth conditions, enabling efficient FP8/INT8/INT4 training. Evaluated on GLUE, multiple-choice, and generative tasks, it matches MeZO’s performance; for LLaMA2-7B fine-tuning, it reduces memory consumption by 2.94Γ— and surpasses conventional first-order quantized methods in INT4 accuracy. The core contribution lies in establishing a stable, backpropagation-free, and computationally lightweight quantized fine-tuning paradigm.

Technology Category

Application Category

πŸ“ Abstract
Language Models (LLMs) are often quantized to lower precision to reduce the memory cost and latency in inference. However, quantization often degrades model performance, thus fine-tuning is required for various down-stream tasks. Traditional fine-tuning methods such as stochastic gradient descent and Adam optimization require backpropagation, which are error-prone in the low-precision settings. To overcome these limitations, we propose the Quantized Zeroth-Order (QuZO) framework, specifically designed for fine-tuning LLMs through low-precision (e.g., 4- or 8-bit) forward passes. Our method can avoid the error-prone low-precision straight-through estimator, and utilizes optimized stochastic rounding to mitigate the increased bias. QuZO simplifies the training process, while achieving results comparable to first-order methods in ${ m FP}8$ and superior accuracy in ${ m INT}8$ and ${ m INT}4$ training. Experiments demonstrate that low-bit training QuZO achieves performance comparable to MeZO optimization on GLUE, Multi-Choice, and Generation tasks, while reducing memory cost by $2.94 imes$ in LLaMA2-7B fine-tuning compared to quantized first-order methods.
Problem

Research questions and friction points this paper is trying to address.

Quantize LLMs for memory efficiency
Avoid error-prone low-precision backpropagation
Fine-tune LLMs with reduced memory cost
Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantized Zeroth-Order framework
Low-precision forward passes
Optimized stochastic rounding
πŸ”Ž Similar Papers
No similar papers found.