Workspace Optimization: How to Train Your Agent

📅 2026-05-10

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

This work addresses the challenge that large language model agents, with frozen weights, struggle to learn through interaction in complex multi-turn environments. To overcome this limitation, the authors propose Workspace Optimization—a novel approach that shifts the training paradigm from weight space to a structured external workspace. This method substitutes parameters, data, loss, and gradients with artifacts, evidence, counterexamples, and textual feedback, respectively, thereby emulating a training mechanism without modifying model weights. The framework constructs an executable world model enabling multi-role collaborative reasoning and failure-aware routing. Implemented within the DreamTeam multi-agent architecture, it modularly supports hypothesis generation, planning, exploration, and strategy formulation. Evaluated on the ARC-AGI-3 public test set, the approach improves performance from 36% to 38.4% while reducing the number of interactive actions per episode by 31%.

📝 Abstract

Modern agents built on frontier language models often cannot adapt their weights. What, then, remains trainable? We argue it is the agent's \emph{workspace}, the structured external substrate it reads, writes, and tests; we call its evolution workspace optimization. Workspace optimization targets hard multi-turn environments where a frontier model has strong priors but cannot solve the task in a single shot, so the agent must learn through interaction. We propose a principled way to evolve the workspace, mirroring the structure of weight-space training: artifacts in place of parameters, evidence in place of data, counterexamples in place of losses, and textual feedback in place of gradients. We instantiate the idea in DreamTeam, a multi-agent harness for ARC-AGI-3 whose roles build an executable world model, plan, hypothesize, probe, strategize, and route failures. On the current 25-game ARC-AGI-3 public set under the official scoring protocol and averaged over two independent runs, DreamTeam improves the SOTA protocol-matched agent's score from 36% to 38.4%, while using 31% fewer environment actions per game.

Problem

Research questions and friction points this paper is trying to address.

workspace optimization

agent training

multi-turn environments

frontier language models

external substrate

Innovation

Methods, ideas, or system contributions that make the work stand out.

workspace optimization

frontier language models

multi-agent systems