Adversarial Contrastive Learning for LLM Quantization Attacks

📅 2026-01-06

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

186K/year

🤖 AI Summary

While model quantization reduces the deployment cost of large language models, it may introduce security vulnerabilities that trigger malicious behaviors in quantized models. This work proposes an Adversarial Contrastive Learning (ACL) framework, which, for the first time, incorporates triplet contrastive loss into quantization-aware attacks and combines it with a two-stage distributed fine-tuning strategy based on projected gradient descent to explicitly widen the probability gap between benign and harmful responses. The method achieves attack success rates of 86.00%, 97.69%, and 92.40% on over-refusal, jailbreak, and ad-injection attacks, respectively—improving upon existing approaches by up to 50.80% and significantly enhancing both attack effectiveness and stability.

Technology Category

Application Category

📝 Abstract

Model quantization is critical for deploying large language models (LLMs) on resource-constrained hardware, yet recent work has revealed severe security risks that benign LLMs in full precision may exhibit malicious behaviors after quantization. In this paper, we propose Adversarial Contrastive Learning (ACL), a novel gradient-based quantization attack that achieves superior attack effectiveness by explicitly maximizing the gap between benign and harmful responses probabilities. ACL formulates the attack objective as a triplet-based contrastive loss, and integrates it with a projected gradient descent two-stage distributed fine-tuning strategy to ensure stable and efficient optimization. Extensive experiments demonstrate ACL's remarkable effectiveness, achieving attack success rates of 86.00% for over-refusal, 97.69% for jailbreak, and 92.40% for advertisement injection, substantially outperforming state-of-the-art methods by up to 44.67%, 18.84%, and 50.80%, respectively.

Problem

Research questions and friction points this paper is trying to address.

LLM quantization

security risks

adversarial attacks

malicious behavior

model quantization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial Contrastive Learning

LLM Quantization Attack

Contrastive Loss