Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of efficiently fine-tuning quantized large language models, whose discrete and non-differentiable parameters preclude effective optimization via conventional backpropagation. The authors propose a novel zeroth-order optimization paradigm that enables direct full-parameter fine-tuning of quantized models without relying on gradient backpropagation. By integrating evolutionary strategies with a cumulative error feedback mechanism, the method preserves high-fidelity gradient-like signals, while a stateless seed replay technique reduces memory overhead to levels comparable to low-precision inference. Evaluated on arithmetic reasoning tasks, the approach significantly outperforms existing zeroth-order fine-tuning methods, achieving high-accuracy adaptation while maintaining the low computational cost characteristic of quantized inference.

Technology Category

Application Category

📝 Abstract
Post-Training Quantization (PTQ) is essential for deploying Large Language Models (LLMs) on memory-constrained devices, yet it renders models static and difficult to fine-tune. Standard fine-tuning paradigms, including Reinforcement Learning (RL), fundamentally rely on backpropagation and high-precision weights to compute gradients. Thus they cannot be used on quantized models, where the parameter space is discrete and non-differentiable. While Evolution Strategies (ES) offer a backpropagation-free alternative, optimization of the quantized parameters can still fail due to vanishing or inaccurate gradient. This paper introduces Quantized Evolution Strategies (QES), an optimization paradigm that performs full-parameter fine-tuning directly in the quantized space. QES is based on two innovations: (1) it integrates accumulated error feedback to preserve high-precision gradient signals, and (2) it utilizes a stateless seed replay to reduce memory usage to low-precision inference levels. QES significantly outperforms the state-of-the-art zeroth-order fine-tuning method on arithmetic reasoning tasks, making direct fine-tuning for quantized models possible. It therefore opens up the possibility for scaling up LLMs entirely in the quantized space. The source code is available at https://github.com/dibbla/Quantized-Evolution-Strategies .
Problem

Research questions and friction points this paper is trying to address.

Quantized LLMs
Post-Training Quantization
Fine-tuning
Evolution Strategies
Discrete Optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantized Evolution Strategies
Post-Training Quantization
Zeroth-order Optimization
Error Feedback
Memory-efficient Fine-tuning
🔎 Similar Papers
No similar papers found.