🤖 AI Summary
Existing meta-agent systems lack formal modeling of operations and efficient, traceable execution mechanisms, hindering effective intervention, counterfactual optimization, and scalable training. This work proposes a functional programming model that formalizes meta-agent operations as functions and implements a mechanized core in Lean. By recording all interactions in a Git-like, typed execution trace, the approach enables rapid forking and replay from arbitrary historical states. This is the first framework to unify formal operational semantics with fully traceable execution for meta-agents. Empirical results demonstrate that interventions on CooperBench raise pair programming pass rates to 54.7% (+25.9%), counterfactual optimization yields up to an 11-point improvement across four benchmarks with 58% less runtime, Tree-RL training performance reaches 39.4%, agent forking is five times faster than Docker, and prompt cache reuse exceeds 95%.
📝 Abstract
We introduce Shepherd, a functional programming model that formalizes meta-agent operations on target agents as functions, with core operations mechanized in Lean. Shepherd records every agent-environment interaction as a typed event in a Git-like execution trace, enabling any past state to be forked and replayed. The system forks the agent process and its filesystem $5\times$ faster than Docker, achieving $>95\%$ prompt-cache reuse on replay. We demonstrate the model through three applications. First, in runtime intervention, a live supervisor increases pair coding pass rates from 28.8% to 54.7% on CooperBench. Second, in counterfactual meta-optimization, branching exploration outperforms baselines across four benchmarks by up to 11 points while reducing wall-clock time by up to 58%. Third, in Tree-RL training, forking rollouts at selected turns improves TerminalBench-2 performance from 34.2% to 39.4%. These results establish Shepherd as an efficient infrastructure for programming meta-agents. We open-source the system to support future research.