Boost Post-Training Quantization via Null Space Optimization for Large Language Models

📅 2025-05-21

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing post-training quantization (PTQ) methods for large language models (LLMs) suffer severe accuracy degradation at ultra-low bit-widths (e.g., W4A4), with diminishing marginal gains. This work introduces **nullspace optimization**—the first application of this principle to PTQ—by constraining quantization-induced weight perturbations to lie within the nullspace of input activations, thereby theoretically suppressing accuracy loss. To realize this, we propose a plug-and-play **Q2N module**, and derive a closed-form nullspace projection solution that incurs no additional memory overhead. Integrating our method with LLM activation statistics modeling and standard PTQ frameworks, we achieve substantial improvements over state-of-the-art PTQ approaches on LLaMA-3, DeepSeek, and Qwen-3 across multiple benchmarks. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Existing post-training quantization methods for large language models (LLMs) offer remarkable success. However, the increasingly marginal performance gains suggest that existing quantization strategies are insufficient to support the development of more compressed models. To inspire new directions for future research, this paper introduces the concept of null space into LLMs quantization. We argue that the quantization error can be effectively alleviated by constraining the post-quantization weight perturbation to lie within the null space of input activations. To prove this idea, we propose a plug-and-play null space projection module for existing milestone PTQ baselines named Q2N. Specifically, we first design an efficient and accurate null space projection approximation method tailored to the characteristics of LLMs. Subsequently, we theoretically derive a closed-form solution for an equivalent vector of the obtained projection matrix, which satisfies practical inference condition while avoiding additional memory overhead. Extensive experiments are conducted on various state-of-the-art LLMs (LLaMA3, DeepSeek, Qwen3) and baselines, demonstrating the effectiveness of both our Q2N and the perspective of null space optimization for LLMs quantization. We view this paper the first step to further alleviate the quantization error based on the insights of null space, hoping it inspiring future researchers to design more advanced quantization methods. Codes are available at https://github.com/zjq0455/q2n.

Problem

Research questions and friction points this paper is trying to address.

Reducing quantization error in large language models

Optimizing post-quantization weight perturbation via null space

Enhancing compression without additional memory overhead

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces null space concept for LLM quantization

Proposes plug-and-play null space projection module Q2N

Derives closed-form solution for efficient projection

🔎 Similar Papers

No similar papers found.

Authors to Follow