Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI Agents

📅 2026-04-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

199K/year
🤖 AI Summary
This work addresses the risk of malicious behavior in large language model–driven AI agents operating within structured workflows when invoking external tools. The authors propose a telemetry-driven behavioral anomaly detection firewall that, for the first time, adapts sequence-based intrusion detection principles to AI agent security. By offline learning benign tool invocation sequences, the method constructs a parameterized deterministic finite automaton (pDFA) to enforce valid tool sequences, contextual constraints, and parameter boundaries at runtime with O(1) complexity. The resulting lightweight, structure-aware runtime defense reduces average attack success rates to 5.6% on the Agent Security Bench—dropping to 2.2% in structured workflows—and achieves 0% success against multi-step contextual attacks, while introducing only 2.2 ms latency and a 2.0% benign task failure rate, substantially outperforming the state-of-the-art Aegis framework.
📝 Abstract
Structured-workflow agents driven by large language models execute tool calls against sensitive external environments. We propose \codename, a telemetry-driven behavioral anomaly detection firewall. Drawing on sequence-based intrusion detection, \codename\ compiles verified benign tool-call telemetry into a parameterized deterministic finite automaton (pDFA). The model defines permitted tool sequences, sequential contexts, and parameter bounds. At runtime, a lightweight gateway enforces these boundaries via an $O(1)$ state-transition structural lookup, shifting computationally expensive analysis entirely offline. Evaluated on the Agent Security Bench (ASB), \codename\ achieves a 5.6\% macro-averaged attack success rate (ASR) across five scenarios. Within three structured workflows, ASR drops to 2.2\%, outperforming Aegis, a state-of-the-art stateless scanner, at 12.8\%. \codename\ achieves 0\% ASR on multi-step and context-sequential attacks in structured settings. Furthermore, against 1,000 algorithmically spliced exfiltration payloads, only 1.4\% matched valid structural paths, all of which failed end-to-end string parameter guards (0 successes out of 14 surviving paths, 95\% CI [0\%, 23.2\%]). \codename\ introduces just 2.2~ms of per-call latency (a 3.7$\times$ speedup over \textsc{Aegis}) while maintaining a 2.0\% benign task failure rate (BTFR) on benign workloads. Modeling the behavioral trajectory effectively collapses the available attack surface, but unmaintained continuous parameter bounds remain vulnerable to synonym-substitution attacks (18\% evasion rate). Thus, exact-match whitelisting of sensitive parameters ultimately bears the final defensive load against execution.
Problem

Research questions and friction points this paper is trying to address.

behavioral anomaly detection
structured-workflow agents
tool-call security
attack surface reduction
parameter validation
Innovation

Methods, ideas, or system contributions that make the work stand out.

behavioral firewall
parameterized deterministic finite automaton
tool-call telemetry
structured-workflow agents
anomaly detection
💼 Related Jobs