GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning

📅 2024-06-13

🏛️ arXiv.org

📈 Citations: 9

✨ Influential: 1

career value

175K/year

🤖 AI Summary

Traditional text-level safety mechanisms fail to adequately address security risks arising from LLM agent behaviors. To bridge this gap, this paper proposes GuardAgent—the first dynamic, agent-level safety guard framework. Its core comprises knowledge-enhanced, two-stage LLM reasoning (security requirement parsing → plan-to-code mapping) integrated with memory-augmented contextual retrieval, enabling real-time behavioral verification of target agents and generation of lightweight, executable code-based safeguards. We introduce a novel agent-level safety guarding paradigm and establish two domain-specific, rigorously designed benchmarks: EICU-AC (for healthcare access control) and Mind2Web-SC (for secure web interaction). Experimental results demonstrate that GuardAgent achieves 98% and 83% guard accuracy on these benchmarks, respectively, effectively suppressing policy violations while maintaining high flexibility, low computational overhead, and strong generalization across diverse agent tasks and environments.

Technology Category

Application Category

📝 Abstract

The rapid advancement of large language model (LLM) agents has raised new concerns regarding their safety and security, which cannot be addressed by traditional textual-harm-focused LLM guardrails. We propose GuardAgent, the first guardrail agent to protect the target agents by dynamically checking whether their actions satisfy given safety guard requests. Specifically, GuardAgent first analyzes the safety guard requests to generate a task plan, and then maps this plan into guardrail code for execution. By performing the code execution, GuardAgent can deterministically follow the safety guard request and safeguard target agents. In both steps, an LLM is utilized as the reasoning component, supplemented by in-context demonstrations retrieved from a memory module storing experiences from previous tasks. GuardAgent can understand different safety guard requests and provide reliable code-based guardrails with high flexibility and low operational overhead. In addition, we propose two novel benchmarks: EICU-AC benchmark to assess the access control for healthcare agents and Mind2Web-SC benchmark to evaluate the safety policies for web agents. We show that GuardAgent effectively moderates the violation actions for different types of agents on these two benchmarks with over 98% and 83% guardrail accuracies, respectively. Project page: https://guardagent.github.io/

Problem

Research questions and friction points this paper is trying to address.

Enhance safety and security of LLM agents

Dynamic checking of agent actions against safety requests

Develop benchmarks for healthcare and web agent safety

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic safety checks via GuardAgent.

LLM-driven reasoning with memory module.

Code-based guardrails with high flexibility.

🔎 Similar Papers

The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies