What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work identifies a novel security vulnerability in large language models (LLMs) under ultra-long contexts (up to 128K tokens): multi-sample jailbreaking (MSJ), wherein repeating benign examples or inserting random placeholder text—without adversarial content—significantly degrades alignment. Methodologically, the authors systematically vary instruction style, example density, topic, and formatting, conducting large-scale empirical evaluations across context lengths. Results show context length is the dominant factor governing attack success: model safety deteriorates sharply with increasing context, exposing fundamental inconsistencies in secure behavior during long-context processing. This is the first empirical demonstration of a structural safety deficiency in current LLMs’ long-context capabilities, revealing that extended context windows inherently compromise alignment robustness. The findings underscore the urgent need for context-aware safety mechanisms—e.g., dynamic alignment calibration or context-length–adaptive guardrails—to ensure reliable, secure operation across diverse context scales.

Technology Category

Application Category

📝 Abstract

We investigate long-context vulnerabilities in Large Language Models (LLMs) through Many-Shot Jailbreaking (MSJ). Our experiments utilize context length of up to 128K tokens. Through comprehensive analysis with various many-shot attack settings with different instruction styles, shot density, topic, and format, we reveal that context length is the primary factor determining attack effectiveness. Critically, we find that successful attacks do not require carefully crafted harmful content. Even repetitive shots or random dummy text can circumvent model safety measures, suggesting fundamental limitations in long-context processing capabilities of LLMs. The safety behavior of well-aligned models becomes increasingly inconsistent with longer contexts. These findings highlight significant safety gaps in context expansion capabilities of LLMs, emphasizing the need for new safety mechanisms.

Problem

Research questions and friction points this paper is trying to address.

Investigating long-context vulnerabilities in LLMs via Many-Shot Jailbreaking

Revealing context length as key factor in attack effectiveness

Exposing safety gaps in LLMs' long-context processing capabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Many-Shot Jailbreaking with 128K tokens

Context length determines attack effectiveness

Repetitive shots bypass safety measures

🔎 Similar Papers

Context-Aware Membership Inference Attacks against Pre-trained Large Language Models