🤖 AI Summary
This work addresses key limitations of ReAct-based large language model (LLM) agents—namely serial execution latency, context bloat, and vulnerability to prompt injection and hallucination during tool invocation—by introducing Intent-Gated Execution (IGX), a novel execution kernel architecture. IGX decouples LLM reasoning from tool execution at the system level, enabling dependency-aware parallel scheduling and robust security control. The framework incorporates a four-dimensional authorization mechanism (scope, intent, impact, and approval) and supports three adaptive execution modes. While incurring only minor latency overhead on simple queries, IGX significantly enhances efficiency on complex tasks: it outperforms ReAct baselines in compute-intensive, parallel data collection scenarios and matches baseline performance on moderate-complexity tasks, all while ensuring verifiable agent behavior.
📝 Abstract
Tool-calling autonomous agents based on large language models using ReAct exhibit three limitations: serial latency, quadratic context growth, and vulnerability to prompt injection and hallucination. Recent work moves towards separating planning from execution but in each case the model remains coupled to the execution mechanics. We introduce a system-level abstraction for LLM agents which decouples the execution of agent workflows from the LLM reasoning layer. We define two first-class abstractions: (1) Intent-Gated Execution (IGX), a security paradigm that enforces intent at execution, and (2) an Executive Kernel that manages scheduling, tool dispatch, dependency resolution, failures and security. In KAIJU, the LLM plans upfront, optimistically scheduling tools in parallel with dependency-aware parameter injection. Tools are authorised via IGX based on four independent variables: scope, intent, impact, and clearance (external approval). KAIJU supports three adaptive execution modes (Reflect, nReflect, and Orchestrator), providing progressively finer-grained execution control apt for complex investigation and deep analysis or research. Empirical evaluation against a ReAct baseline shows that KAIJU has a latency penalty on simple queries due to planning overhead, convergence at moderate complexity, and a structural advantage on computational queries requiring parallel data gathering. Beyond latency, the separation enforces behavioural guarantees that ReAct cannot match through prompting alone. Code available at https://github.com/compdeep/kaiju