LLMZ+: Contextual Prompt Whitelist Principles for Agentic LLMs

📅 2025-09-22

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Agent-based large language models (LLMs) pose severe operational and information security risks due to their privileged access to data sources and APIs, coupled with autonomous decision-making over execution paths—guided only by high-level objectives. Existing defenses rely on post-hoc malicious intent detection (e.g., prompt injection identification), which fails against jailbreaking attacks and suffers from high false-positive and false-negative rates. Method: We propose a context-aware prompt whitelist mechanism that performs proactive, fine-grained contextual compliance verification on user inputs. It dynamically models execution context and matches inputs against granular, scenario-specific policies, admitting only prompts aligned with predefined business workflows. Contribution/Results: By shifting from reactive detection to verifiable, pre-execution input constraints, our approach achieves robustness and maintainability gains. Experiments demonstrate complete mitigation of mainstream jailbreaking attacks with zero false positives and zero false negatives—ensuring uninterrupted, secure business interactions.

Technology Category

Application Category

📝 Abstract

Compared to traditional models, agentic AI represents a highly valuable target for potential attackers as they possess privileged access to data sources and API tools, which are traditionally not incorporated into classical agents. Unlike a typical software application residing in a Demilitarized Zone (DMZ), agentic LLMs consciously rely on nondeterministic behavior of the AI (only defining a final goal, leaving the path selection to LLM). This characteristic introduces substantial security risk to both operational security and information security. Most common existing defense mechanism rely on detection of malicious intent and preventing it from reaching the LLM agent, thus protecting against jailbreak attacks such as prompt injection. In this paper, we present an alternative approach, LLMZ+, which moves beyond traditional detection-based approaches by implementing prompt whitelisting. Through this method, only contextually appropriate and safe messages are permitted to interact with the agentic LLM. By leveraging the specificity of context, LLMZ+ guarantees that all exchanges between external users and the LLM conform to predefined use cases and operational boundaries. Our approach streamlines the security framework, enhances its long-term resilience, and reduces the resources required for sustaining LLM information security. Our empirical evaluation demonstrates that LLMZ+ provides strong resilience against the most common jailbreak prompts. At the same time, legitimate business communications are not disrupted, and authorized traffic flows seamlessly between users and the agentic LLM. We measure the effectiveness of approach using false positive and false negative rates, both of which can be reduced to 0 in our experimental setting.

Problem

Research questions and friction points this paper is trying to address.

Securing agentic LLMs from attacks exploiting their API access

Replacing detection-based defenses with contextual prompt whitelisting

Ensuring safe user-LLM exchanges within predefined operational boundaries

Innovation

Methods, ideas, or system contributions that make the work stand out.

Implements contextual prompt whitelisting for agentic LLMs

Permits only contextually appropriate and safe messages

Guarantees exchanges conform to predefined use cases

🔎 Similar Papers

Towards Hierarchical Multi-Agent Workflows for Zero-Shot Prompt Optimization