AgentVisor: Defending LLM Agents Against Prompt Injection via Semantic Virtualization

πŸ“… 2026-04-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

208K/year
πŸ€– AI Summary
This work addresses the vulnerability of large language model (LLM) agents to both direct and indirect prompt injection attacks when integrating untrusted external data, a challenge for which existing defenses struggle to balance security and functionality. The paper proposes AgentVisor, a novel framework that introduces operating system–level security primitives into LLM agent protection for the first time. By treating the target agent as an untrusted guest and employing semantic virtualization alongside privilege separation, AgentVisor enables a trusted semantic monitor to intercept tool invocations. Coupled with an OS-based auditing protocol and a one-shot self-correction mechanism, the system provides comprehensive defense against both attack vectors. Experimental results demonstrate that AgentVisor reduces attack success rates to 0.65% while incurring only a 1.45% loss in functionality, significantly outperforming current approaches.

Technology Category

Application Category

πŸ“ Abstract
Large Language Model (LLM) agents are increasingly used to automate complex workflows, but integrating untrusted external data with privileged execution exposes them to severe security risks, particularly direct and indirect prompt injection. Existing defenses face significant challenges in balancing security with utility, often encountering a trade-off where rigorous protection leads to over-defense, or where subtle indirect injections bypass detection. Drawing inspiration from operating system virtualization, we propose AgentVisor, a novel defense framework that enforces semantic privilege separation. AgentVisor treats the target agent as an untrusted guest and intercepts tool calls via a trusted semantic visor. Central to our approach is a rigorous audit protocol grounded in classic OS security primitives, designed to systematically mitigate both direct and indirect injection attacks. Furthermore, we introduce a one-shot self-correction mechanism that transforms security violations into constructive feedback, enabling agents to recover from attacks. Extensive experiments show that AgentVisor reduces the attack success rate to 0.65%, achieving this strong defense while incurring only a 1.45% average decrease in utility relative to the No Defense scenario, demonstrating superior performance compared to existing defense methods.
Problem

Research questions and friction points this paper is trying to address.

prompt injection
LLM agents
security
semantic virtualization
privilege separation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prompt Injection Defense
Semantic Virtualization
Privilege Separation
LLM Agent Security
Self-Correction Mechanism
πŸ”Ž Similar Papers
πŸ’Ό Related Jobs
Zonghao Ying
Zonghao Ying
SKLCCSE, BUAA
Trustworthy AI
H
Haozheng Wang
Beihang University
J
Jiangfan Liu
Beihang University
Q
Quanchen Zou
360 AI Security Lab
A
Aishan Liu
Beihang University
J
Jian Yang
Beihang University
Yaodong Yang
Yaodong Yang
Boya (εšι›…) Assistant Professor at Peking University
Reinforcement LearningAI AlignmentEmbodied AI
X
Xianglong Liu
Beihang University