Adversarial Contrastive Learning for LLM Quantization Attacks

📅 2026-01-06
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
While model quantization reduces the deployment cost of large language models, it may introduce security vulnerabilities that trigger malicious behaviors in quantized models. This work proposes an Adversarial Contrastive Learning (ACL) framework, which, for the first time, incorporates triplet contrastive loss into quantization-aware attacks and combines it with a two-stage distributed fine-tuning strategy based on projected gradient descent to explicitly widen the probability gap between benign and harmful responses. The method achieves attack success rates of 86.00%, 97.69%, and 92.40% on over-refusal, jailbreak, and ad-injection attacks, respectively—improving upon existing approaches by up to 50.80% and significantly enhancing both attack effectiveness and stability.

Technology Category

Application Category

📝 Abstract
Model quantization is critical for deploying large language models (LLMs) on resource-constrained hardware, yet recent work has revealed severe security risks that benign LLMs in full precision may exhibit malicious behaviors after quantization. In this paper, we propose Adversarial Contrastive Learning (ACL), a novel gradient-based quantization attack that achieves superior attack effectiveness by explicitly maximizing the gap between benign and harmful responses probabilities. ACL formulates the attack objective as a triplet-based contrastive loss, and integrates it with a projected gradient descent two-stage distributed fine-tuning strategy to ensure stable and efficient optimization. Extensive experiments demonstrate ACL's remarkable effectiveness, achieving attack success rates of 86.00% for over-refusal, 97.69% for jailbreak, and 92.40% for advertisement injection, substantially outperforming state-of-the-art methods by up to 44.67%, 18.84%, and 50.80%, respectively.
Problem

Research questions and friction points this paper is trying to address.

LLM quantization
security risks
adversarial attacks
malicious behavior
model quantization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial Contrastive Learning
LLM Quantization Attack
Contrastive Loss
Projected Gradient Descent
Quantization Security
🔎 Similar Papers
No similar papers found.