ICL-EVADER: Zero-Query Black-Box Evasion Attacks on In-Context Learning and Their Defenses

📅 2026-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the pronounced vulnerability of in-context learning (ICL) mechanisms in real-world adversarial settings, particularly the absence of effective attack and defense methods under zero-query black-box conditions. The authors propose ICL-Evader, a novel framework that, for the first time, achieves highly effective evasion attacks—reaching up to 95.3% success rate—without access to model parameters, gradients, or query feedback. It exploits inherent limitations in how large language models process contextual information through three prompt-based attack strategies: Fake Claim, Template, and Needle-in-a-Haystack. Furthermore, the study introduces a unified defense strategy coupled with an automated hardening tool that robustly mitigates all proposed attacks while preserving task accuracy within a 5% degradation margin. This effort establishes the first comprehensive, integrated evaluation framework for ICL-specific adversarial attacks and defenses.

Technology Category

Application Category

📝 Abstract
In-context learning (ICL) has become a powerful, data-efficient paradigm for text classification using large language models. However, its robustness against realistic adversarial threats remains largely unexplored. We introduce ICL-Evader, a novel black-box evasion attack framework that operates under a highly practical zero-query threat model, requiring no access to model parameters, gradients, or query-based feedback during attack generation. We design three novel attacks, Fake Claim, Template, and Needle-in-a-Haystack, that exploit inherent limitations of LLMs in processing in-context prompts. Evaluated across sentiment analysis, toxicity, and illicit promotion tasks, our attacks significantly degrade classifier performance (e.g., achieving up to 95.3% attack success rate), drastically outperforming traditional NLP attacks which prove ineffective under the same constraints. To counter these vulnerabilities, we systematically investigate defense strategies and identify a joint defense recipe that effectively mitigates all attacks with minimal utility loss (<5% accuracy degradation). Finally, we translate our defensive insights into an automated tool that proactively fortifies standard ICL prompts against adversarial evasion. This work provides a comprehensive security assessment of ICL, revealing critical vulnerabilities and offering practical solutions for building more robust systems. Our source code and evaluation datasets are publicly available at: https://github.com/ChaseSecurity/ICL-Evader .
Problem

Research questions and friction points this paper is trying to address.

In-Context Learning
Black-Box Attack
Adversarial Evasion
Zero-Query Threat Model
Large Language Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

In-Context Learning
Zero-Query Attack
Black-Box Evasion
Adversarial Robustness
LLM Security
🔎 Similar Papers
No similar papers found.
N
Ningyuan He
University of Science and Technology of China
R
Ronghong Huang
University of Science and Technology of China
Q
Qianqian Tang
Shandong University
Hongyu Wang
Hongyu Wang
Institute of Computing Technology, Chinese Academy of Sciences
Deep LearningNatural Language ProcessingComputer Vision
Xianghang Mi
Xianghang Mi
USTC
Computer SecurityNetworkingPrivacy
Shanqing Guo
Shanqing Guo
Shandong University