RedVisor: Reasoning-Aware Prompt Injection Defense via Zero-Copy KV Cache Reuse

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This work addresses the vulnerability of large language models to prompt injection attacks, a challenge for which existing defenses struggle to balance security, generality, and efficiency. The authors propose RedVisor, a framework that deploys a lightweight, removable inference adapter on top of a frozen backbone model, activated only during inference to provide security without compromising the model’s original capabilities. RedVisor introduces three key innovations: it simultaneously performs attack detection and guides safe responses through fine-grained control of the inference path; it employs a zero-copy key-value cache reuse mechanism to eliminate redundant computation; and it seamlessly integrates the defense module into the vLLM inference engine. Experimental results demonstrate that RedVisor significantly outperforms state-of-the-art methods in both detection accuracy and throughput, all while preserving the base model’s general-purpose performance.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly vulnerable to Prompt Injection (PI) attacks, where adversarial instructions hidden within retrieved contexts hijack the model's execution flow. Current defenses typically face a critical trade-off: prevention-based fine-tuning often degrades general utility via the"alignment tax", while detection-based filtering incurs prohibitive latency and memory costs. To bridge this gap, we propose RedVisor, a unified framework that synthesizes the explainability of detection systems with the seamless integration of prevention strategies. To the best of our knowledge, RedVisor is the first approach to leverage fine-grained reasoning paths to simultaneously detect attacks and guide the model's safe response. We implement this via a lightweight, removable adapter positioned atop the frozen backbone. This adapter serves a dual function: it first generates an explainable analysis that precisely localizes the injection and articulates the threat, which then explicitly conditions the model to reject the malicious command. Uniquely, the adapter is active only during this reasoning phase and is effectively muted during the subsequent response generation. This architecture yields two distinct advantages: (1) it mathematically preserves the backbone's original utility on benign inputs; and (2) it enables a novel KV Cache Reuse strategy, eliminating the redundant prefill computation inherent to decoupled pipelines. We further pioneer the integration of this defense into the vLLM serving engine with custom kernels. Experiments demonstrate that RedVisor outperforms state-of-the-art defenses in detection accuracy and throughput while incurring negligible utility loss.

Problem

Research questions and friction points this paper is trying to address.

Prompt Injection

Large Language Models

Security Defense

Alignment Tax

KV Cache

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prompt Injection Defense

Reasoning-Aware Detection

KV Cache Reuse