The Last Harness You'll Ever Build

📅 2026-04-22
📈 Citations: 0
Influential: 0
📄 PDF

career value

212K/year
🤖 AI Summary
Current AI agents require expert-crafted workflow harnesses to adapt to new task domains, a process that is costly and poorly scalable. This work proposes a two-layer automated framework: an inner Harness Evolution Loop—comprising Worker, Evaluator, and Evolution Agent modules—automatically optimizes harnesses for individual tasks, while an outer Meta-Evolution Loop enables cross-task optimization of the evolution protocol itself, achieving meta-level self-adaptation of harness generation. This approach represents the first fully automated harness engineering methodology and supports a “build once, deploy anywhere” paradigm for general-purpose deployment. Experimental results demonstrate that the system can rapidly adapt to novel task domains without human intervention, substantially lowering deployment barriers and enhancing cross-domain generalization capabilities.

Technology Category

Application Category

📝 Abstract
AI agents are increasingly deployed on complex, domain-specific workflows -- navigating enterprise web applications that require dozens of clicks and form fills, orchestrating multi-step research pipelines that span search, extraction, and synthesis, automating code review across unfamiliar repositories, and handling customer escalations that demand nuanced domain knowledge. \textbf{Each new task domain requires painstaking, expert-driven harness engineering}: designing the prompts, tools, orchestration logic, and evaluation criteria that make a foundation model effective. We present a two-level framework that automates this process. At the first level, the \textbf{Harness Evolution Loop} optimizes a worker agent's harness $\mathcal{H}$ for a single task: a Worker Agent $W_{\mathcal{H}}$ executes the task, an Evaluator Agent $V$ adversarially diagnoses failures and scores performance, and an Evolution Agent $E$ modifies the harness based on the full history of prior attempts. At the second level, the \textbf{Meta-Evolution Loop} optimizes the evolution protocol $Λ= (W_{\mathcal{H}}, \mathcal{H}^{(0)}, V, E)$ itself across diverse tasks, \textbf{learning a protocol $Λ^{(\text{best})}$ that enables rapid harness convergence on any new task -- so that adapting an agent to a novel domain requires no human harness engineering at all.} We formalize the correspondence to meta-learning and present both algorithms. The framework \textbf{shifts manual harness engineering into automated harness engineering}, and takes one step further -- \textbf{automating the design of the automation itself}.
Problem

Research questions and friction points this paper is trying to address.

harness engineering
AI agents
domain-specific workflows
task adaptation
automation
Innovation

Methods, ideas, or system contributions that make the work stand out.

automated harness engineering
meta-evolution
agent orchestration
foundation model adaptation
adversarial evaluation
🔎 Similar Papers