OJBKQ: Objective-Joint Babai-Klein Quantization

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This work addresses the significant performance degradation of large language models under low-bit (3–4 bit) post-training quantization, which stems from heuristic objectives and greedy rounding in existing methods. The authors propose the first formulation that models weight quantization as a joint optimization problem involving both activations and weights. This is cast layer-wise as a constrained integer least squares problem with multiple right-hand sides. To efficiently solve for near-optimal integer solutions, they introduce an extended Babai nearest-plane algorithm augmented with Klein randomization and a K-Best sampling strategy. The resulting method substantially reduces model perplexity at 3–4 bits, outperforming current post-training quantization approaches while maintaining comparable computational overhead.

Technology Category

Application Category

📝 Abstract

Post-training quantization (PTQ) is widely used to compress large language models without retraining. However, many existing weight-only methods rely on heuristic objectives and greedy rounding, thus leading to noticeable degradation under low-bit quantization. In this work, we introduce OJBKQ (Objective-Joint Babai-Klein Quantization with K-Best Sampling), a layer-wise PTQ method that formulates weight quantization as a joint optimization problem over activations and weights. This formulation results in a multiple-right-hand-side box-constrained integer least squares (BILS) problem in each layer, which is NP-hard. For each column of the weight matrix, we apply an extended Babai nearest-plane algorithm and an extended version of Klein's randomized Babai algorithm to find the minimum-residual Babai-Klein point, a sub-optimal solution to the BILS problem. Experimental results on large language models show that OJBKQ achieves lower perplexity at 3-4 bits compared to existing PTQ approaches, while maintaining comparable computational cost.

Problem

Research questions and friction points this paper is trying to address.

post-training quantization

weight-only quantization

low-bit quantization

large language models

quantization degradation

Innovation

Methods, ideas, or system contributions that make the work stand out.

post-training quantization

joint optimization

integer least squares