Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI Agents

📅 2026-04-29

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This work addresses the risk of malicious behavior in large language model–driven AI agents operating within structured workflows when invoking external tools. The authors propose a telemetry-driven behavioral anomaly detection firewall that, for the first time, adapts sequence-based intrusion detection principles to AI agent security. By offline learning benign tool invocation sequences, the method constructs a parameterized deterministic finite automaton (pDFA) to enforce valid tool sequences, contextual constraints, and parameter boundaries at runtime with O(1) complexity. The resulting lightweight, structure-aware runtime defense reduces average attack success rates to 5.6% on the Agent Security Bench—dropping to 2.2% in structured workflows—and achieves 0% success against multi-step contextual attacks, while introducing only 2.2 ms latency and a 2.0% benign task failure rate, substantially outperforming the state-of-the-art Aegis framework.

📝 Abstract

Structured-workflow agents driven by large language models execute tool calls against sensitive external environments. We propose \codename, a telemetry-driven behavioral anomaly detection firewall. Drawing on sequence-based intrusion detection, \codename\ compiles verified benign tool-call telemetry into a parameterized deterministic finite automaton (pDFA). The model defines permitted tool sequences, sequential contexts, and parameter bounds. At runtime, a lightweight gateway enforces these boundaries via an $O(1)$ state-transition structural lookup, shifting computationally expensive analysis entirely offline. Evaluated on the Agent Security Bench (ASB), \codename\ achieves a 5.6\% macro-averaged attack success rate (ASR) across five scenarios. Within three structured workflows, ASR drops to 2.2\%, outperforming Aegis, a state-of-the-art stateless scanner, at 12.8\%. \codename\ achieves 0\% ASR on multi-step and context-sequential attacks in structured settings. Furthermore, against 1,000 algorithmically spliced exfiltration payloads, only 1.4\% matched valid structural paths, all of which failed end-to-end string parameter guards (0 successes out of 14 surviving paths, 95\% CI [0\%, 23.2\%]). \codename\ introduces just 2.2~ms of per-call latency (a 3.7$\times$ speedup over \textsc{Aegis}) while maintaining a 2.0\% benign task failure rate (BTFR) on benign workloads. Modeling the behavioral trajectory effectively collapses the available attack surface, but unmaintained continuous parameter bounds remain vulnerable to synonym-substitution attacks (18\% evasion rate). Thus, exact-match whitelisting of sensitive parameters ultimately bears the final defensive load against execution.

Problem

Research questions and friction points this paper is trying to address.

behavioral anomaly detection

structured-workflow agents

tool-call security

attack surface reduction

parameter validation

Innovation

Methods, ideas, or system contributions that make the work stand out.

behavioral firewall

parameterized deterministic finite automaton

tool-call telemetry