ICL-EVADER: Zero-Query Black-Box Evasion Attacks on In-Context Learning and Their Defenses

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the pronounced vulnerability of in-context learning (ICL) mechanisms in real-world adversarial settings, particularly the absence of effective attack and defense methods under zero-query black-box conditions. The authors propose ICL-Evader, a novel framework that, for the first time, achieves highly effective evasion attacks—reaching up to 95.3% success rate—without access to model parameters, gradients, or query feedback. It exploits inherent limitations in how large language models process contextual information through three prompt-based attack strategies: Fake Claim, Template, and Needle-in-a-Haystack. Furthermore, the study introduces a unified defense strategy coupled with an automated hardening tool that robustly mitigates all proposed attacks while preserving task accuracy within a 5% degradation margin. This effort establishes the first comprehensive, integrated evaluation framework for ICL-specific adversarial attacks and defenses.

Technology Category

Application Category

📝 Abstract

In-context learning (ICL) has become a powerful, data-efficient paradigm for text classification using large language models. However, its robustness against realistic adversarial threats remains largely unexplored. We introduce ICL-Evader, a novel black-box evasion attack framework that operates under a highly practical zero-query threat model, requiring no access to model parameters, gradients, or query-based feedback during attack generation. We design three novel attacks, Fake Claim, Template, and Needle-in-a-Haystack, that exploit inherent limitations of LLMs in processing in-context prompts. Evaluated across sentiment analysis, toxicity, and illicit promotion tasks, our attacks significantly degrade classifier performance (e.g., achieving up to 95.3% attack success rate), drastically outperforming traditional NLP attacks which prove ineffective under the same constraints. To counter these vulnerabilities, we systematically investigate defense strategies and identify a joint defense recipe that effectively mitigates all attacks with minimal utility loss (<5% accuracy degradation). Finally, we translate our defensive insights into an automated tool that proactively fortifies standard ICL prompts against adversarial evasion. This work provides a comprehensive security assessment of ICL, revealing critical vulnerabilities and offering practical solutions for building more robust systems. Our source code and evaluation datasets are publicly available at: https://github.com/ChaseSecurity/ICL-Evader .

Problem

Research questions and friction points this paper is trying to address.

In-Context Learning

Black-Box Attack

Adversarial Evasion

Zero-Query Threat Model

Large Language Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

In-Context Learning

Zero-Query Attack

Black-Box Evasion