JacQuant: STE-Free Quantization-Aware Training via Learned Jacobian Surrogates

📅 2026-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the instability and behavioral mismatch of traditional quantization-aware training (QAT), which relies on the straight-through estimator (STE) near quantization boundaries. The authors propose JacQuant, a novel framework that replaces STE with a lightweight, data-driven, learnable Jacobian surrogate—structured as diagonal or block-diagonal—to model parameter perturbation sensitivity. This plug-and-play, STE-free approach is compatible with mainstream weight and activation quantization schemes and integrates a variance-reduced optimizer. Evaluated on large language models quantized to ≤2 bits, JacQuant significantly outperforms existing QAT methods while providing convergence guarantees under non-convex objectives. Notably, its additional computational overhead remains negligible at practical grouping scales.
📝 Abstract
Quantization-aware training (QAT) is widely deployed but typically relies on the Straight-Through Estimator (STE), which passes gradients through non-differentiable quantizers by fiat. This often makes training brittle near bin boundaries and weakly aligned with the actual behavior of the low-precision model. We introduce JacQuant, a QAT framework that learns a lightweight surrogate of the model's local sensitivity to parameter changes and uses it to stabilize and accelerate training within standard variance-reduced optimizers. The surrogate is inexpensive (diagonal or block-diagonal), data-driven, and compatible with common weight and activation quantizers. On code-preserving training phases, we prove convergence for non-convex objectives and obtain linear rates under a PL condition, and we relate the learned sensitivity to end-to-end output fidelity via a simple calibration argument. Across LLM benchmarks at $\leq 2$ bits, JacQuant consistently reaches higher accuracy than STE-based QAT, and the runtime analyses on various models show that the added cost remains negligible under practical group sizes. The method is drop-in and requires no changes to the forward quantizers; our empirical claims are scoped to ultra-low-bit LLM QAT.
Problem

Research questions and friction points this paper is trying to address.

Quantization-aware training
Straight-Through Estimator
Low-precision models
Non-differentiable quantizers
Ultra-low-bit LLM
Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantization-Aware Training
Jacobian Surrogate
Straight-Through Estimator
Ultra-Low-Bit Quantization
Large Language Models
🔎 Similar Papers
No similar papers found.