🤖 AI Summary
This work identifies a novel security vulnerability in LLM agents arising from prompt compression modules: adversaries can manipulate the compression process to induce semantic drift, thereby altering model behavior. To address this, we propose CompressionAttack—a framework that, for the first time, treats prompt compression as an independent attack surface. We design two complementary strategies: hard compression (discrete adversarial editing of prompts) and soft compression (gradient-based perturbation in latent space), enabling efficient, stealthy, and cross-model transferable attacks. Experiments across multiple state-of-the-art LLMs demonstrate attack success rates up to 80% and preference reversal rates as high as 98%. Case studies confirm real-world impact on production systems, including VSCode Copilot and Ollama. Crucially, existing defenses—designed for input or output layers—prove ineffective against such compression-layer attacks.
📝 Abstract
LLM-powered agents often use prompt compression to reduce inference costs, but this introduces a new security risk. Compression modules, which are optimized for efficiency rather than safety, can be manipulated by adversarial inputs, causing semantic drift and altering LLM behavior. This work identifies prompt compression as a novel attack surface and presents CompressionAttack, the first framework to exploit it. CompressionAttack includes two strategies: HardCom, which uses discrete adversarial edits for hard compression, and SoftCom, which performs latent-space perturbations for soft compression. Experiments on multiple LLMs show up to 80% attack success and 98% preference flips, while remaining highly stealthy and transferable. Case studies in VSCode Cline and Ollama confirm real-world impact, and current defenses prove ineffective, highlighting the need for stronger protections.