CompressionAttack: Exploiting Prompt Compression as a New Attack Surface in LLM-Powered Agents

📅 2025-10-26

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work identifies a novel security vulnerability in LLM agents arising from prompt compression modules: adversaries can manipulate the compression process to induce semantic drift, thereby altering model behavior. To address this, we propose CompressionAttack—a framework that, for the first time, treats prompt compression as an independent attack surface. We design two complementary strategies: hard compression (discrete adversarial editing of prompts) and soft compression (gradient-based perturbation in latent space), enabling efficient, stealthy, and cross-model transferable attacks. Experiments across multiple state-of-the-art LLMs demonstrate attack success rates up to 80% and preference reversal rates as high as 98%. Case studies confirm real-world impact on production systems, including VSCode Copilot and Ollama. Crucially, existing defenses—designed for input or output layers—prove ineffective against such compression-layer attacks.

Technology Category

Application Category

📝 Abstract

LLM-powered agents often use prompt compression to reduce inference costs, but this introduces a new security risk. Compression modules, which are optimized for efficiency rather than safety, can be manipulated by adversarial inputs, causing semantic drift and altering LLM behavior. This work identifies prompt compression as a novel attack surface and presents CompressionAttack, the first framework to exploit it. CompressionAttack includes two strategies: HardCom, which uses discrete adversarial edits for hard compression, and SoftCom, which performs latent-space perturbations for soft compression. Experiments on multiple LLMs show up to 80% attack success and 98% preference flips, while remaining highly stealthy and transferable. Case studies in VSCode Cline and Ollama confirm real-world impact, and current defenses prove ineffective, highlighting the need for stronger protections.

Problem

Research questions and friction points this paper is trying to address.

Identifies prompt compression as a novel attack surface in LLM agents

Demonstrates adversarial manipulation causing semantic drift and behavior alteration

Reveals high attack success rates with stealthy transferable compression exploits

Innovation

Methods, ideas, or system contributions that make the work stand out.

Identifies prompt compression as attack surface

Develops HardCom and SoftCom attack strategies

Demonstrates high success rates and stealthiness

🔎 Similar Papers

Efficient Universal Goal Hijacking with Semantics-guided Prompt Organization