Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack

📅 2025-03-18

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work identifies a cross-query instruction injection vulnerability in large language model (LLM) batch prompting: malicious users can embed adversarial instructions within a batch, causing contamination across all queries—e.g., generating phishing links or logical errors. To address this, the authors formally characterize the attack mechanism and introduce BATCHSAFEBENCH, a benchmark comprising 150 attack categories and 8,000 instances. They propose an attention-probing–based detection method achieving 95% attack identification accuracy across multiple LLMs. Through attribution analysis, they identify fragile attention heads exhibiting cross-layer consistency. Experiments demonstrate that all mainstream LLMs are highly vulnerable under batch inference. The proposed probe-based defense and mechanistic analysis establish a novel paradigm for securing batched LLM inference and advancing safety alignment.

Technology Category

Application Category

📝 Abstract

Batch prompting, which combines a batch of multiple queries sharing the same context in one inference, has emerged as a promising solution to reduce inference costs. However, our study reveals a significant security vulnerability in batch prompting: malicious users can inject attack instructions into a batch, leading to unwanted interference across all queries, which can result in the inclusion of harmful content, such as phishing links, or the disruption of logical reasoning. In this paper, we construct BATCHSAFEBENCH, a comprehensive benchmark comprising 150 attack instructions of two types and 8k batch instances, to study the batch prompting vulnerability systematically. Our evaluation of both closed-source and open-weight LLMs demonstrates that all LLMs are susceptible to batch-prompting attacks. We then explore multiple defending approaches. While the prompting-based defense shows limited effectiveness for smaller LLMs, the probing-based approach achieves about 95% accuracy in detecting attacks. Additionally, we perform a mechanistic analysis to understand the attack and identify attention heads that are responsible for it.

Problem

Research questions and friction points this paper is trying to address.

Identifies security vulnerability in batch prompting for LLMs

Develops benchmark to study batch prompting attack susceptibility

Explores and evaluates defense mechanisms against batch prompting attacks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Batch prompting reduces inference costs

BATCHSAFEBENCH benchmarks 150 attack instructions

Probing-based defense detects attacks with 95% accuracy

🔎 Similar Papers

AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs