Widening the Gap: Exploiting LLM Quantization via Outlier Injection

📅 2026-05-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

227K/year
🤖 AI Summary
Existing research struggles to effectively attack mainstream advanced quantization methods—such as AWQ, GPTQ, and GGUF I-quants—leading to an underestimation of their security risks. This work proposes a universal adversarial attack that exploits a common mechanism in high-precision quantization: large outliers force the rounding of other weights to zero. By injecting carefully crafted outliers at critical positions, the method induces quantized models to exhibit predetermined malicious behaviors. It is the first approach to achieve a unified attack across multiple state-of-the-art quantization schemes, significantly outperforming existing techniques across three distinct scenarios and several large language models. The results demonstrate that even sophisticated quantization strategies remain broadly vulnerable to adversarial manipulation.
📝 Abstract
LLM quantization has become essential for memory-efficient deployment. Recent work has shown that quantization schemes can pose critical security risks: an adversary may release a model that appears benign in full precision but exhibits malicious behavior once quantized by users. However, existing quantization-conditioned attacks have been limited to relatively simple quantization methods, where the attacker can estimate weight regions that remain invariant under the target quantization. Notably, prior attacks have consistently failed to compromise more popular and sophisticated schemes, limiting their practical impact. In this work, we introduce the first quantization-conditioned attack that consistently induces malicious behavior that can be triggered by a broad range of advanced quantization techniques, including AWQ, GPTQ, and GGUF I-quants. Our attack exploits a simple property shared by many modern quantization methods: large outliers can cause other weights to be rounded to zero. Consequently, by injecting outliers into specific weight blocks, an adversary can therefore induce a targeted, predictable weight collapse in the model. This effect can be used to craft seemingly benign full-precision models that exhibit a wide range of malicious behaviors after quantization. Through extensive evaluation across three attack scenarios and LLMs, we show that our attack achieves high success rates against a broad range of quantization methods on which prior attacks fail. Our results demonstrate, for the first time, that the security risks of quantization are not restricted to simpler schemes but are broadly relevant across complex, widely-used quantization methods.
Problem

Research questions and friction points this paper is trying to address.

LLM quantization
quantization-conditioned attack
security risk
outlier injection
malicious behavior
Innovation

Methods, ideas, or system contributions that make the work stand out.

quantization-conditioned attack
outlier injection
weight collapse
LLM security
model quantization
🔎 Similar Papers