Prompt Flow Integrity to Prevent Privilege Escalation in LLM Agents

📅 2025-03-17

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

To address privilege escalation vulnerabilities in plugin-augmented LLM agents arising from natural-language prompt-driven execution, this paper proposes Prompt Flow Integrity (PFI), the first framework to integrate system-security principles into LLM agent architecture. PFI establishes a three-layer collaborative defense mechanism: (1) static analysis of prompt structure, (2) dynamic monitoring of execution context, and (3) a plugin permission policy engine coupled with data-flow taint tracking. This design enables untrusted input detection, mandatory least-privilege enforcement, and verification of unsafe data flows. Evaluated across diverse real-world scenarios, PFI achieves 100% mitigation of known privilege escalation attacks, incurs an average latency overhead of less than 8%, and maintains a task completion rate above 99.2%, thereby achieving a robust balance between security assurance and functional usability.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are combined with plugins to create powerful LLM agents that provide a wide range of services. Unlike traditional software, LLM agent's behavior is determined at runtime by natural language prompts from either user or plugin's data. This flexibility enables a new computing paradigm with unlimited capabilities and programmability, but also introduces new security risks, vulnerable to privilege escalation attacks. Moreover, user prompt is prone to be interpreted in an insecure way by LLM agents, creating non-deterministic behaviors that can be exploited by attackers. To address these security risks, we propose Prompt Flow Integrity (PFI), a system security-oriented solution to prevent privilege escalation in LLM agents. Analyzing the architectural characteristics of LLM agents, PFI features three mitigation techniques -- i.e., untrusted data identification, enforcing least privilege on LLM agents, and validating unsafe data flows. Our evaluation result shows that PFI effectively mitigates privilege escalation attacks while successfully preserving the utility of LLM agents.

Problem

Research questions and friction points this paper is trying to address.

Prevent privilege escalation in LLM agents

Address insecure interpretation of user prompts

Mitigate non-deterministic behaviors in LLM agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

Untrusted data identification in LLM agents

Enforcing least privilege on LLM agents

Validating unsafe data flows in LLM agents

🔎 Similar Papers

Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks