Workspace Optimization: How to Train Your Agent

πŸ“… 2026-05-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

217K/year
πŸ€– AI Summary
This work addresses the challenge that large language model agents, with frozen weights, struggle to learn through interaction in complex multi-turn environments. To overcome this limitation, the authors propose Workspace Optimizationβ€”a novel approach that shifts the training paradigm from weight space to a structured external workspace. This method substitutes parameters, data, loss, and gradients with artifacts, evidence, counterexamples, and textual feedback, respectively, thereby emulating a training mechanism without modifying model weights. The framework constructs an executable world model enabling multi-role collaborative reasoning and failure-aware routing. Implemented within the DreamTeam multi-agent architecture, it modularly supports hypothesis generation, planning, exploration, and strategy formulation. Evaluated on the ARC-AGI-3 public test set, the approach improves performance from 36% to 38.4% while reducing the number of interactive actions per episode by 31%.
πŸ“ Abstract
Modern agents built on frontier language models often cannot adapt their weights. What, then, remains trainable? We argue it is the agent's \emph{workspace}, the structured external substrate it reads, writes, and tests; we call its evolution workspace optimization. Workspace optimization targets hard multi-turn environments where a frontier model has strong priors but cannot solve the task in a single shot, so the agent must learn through interaction. We propose a principled way to evolve the workspace, mirroring the structure of weight-space training: artifacts in place of parameters, evidence in place of data, counterexamples in place of losses, and textual feedback in place of gradients. We instantiate the idea in DreamTeam, a multi-agent harness for ARC-AGI-3 whose roles build an executable world model, plan, hypothesize, probe, strategize, and route failures. On the current 25-game ARC-AGI-3 public set under the official scoring protocol and averaged over two independent runs, DreamTeam improves the SOTA protocol-matched agent's score from 36% to 38.4%, while using 31% fewer environment actions per game.
Problem

Research questions and friction points this paper is trying to address.

workspace optimization
agent training
multi-turn environments
frontier language models
external substrate
Innovation

Methods, ideas, or system contributions that make the work stand out.

workspace optimization
frontier language models
multi-agent systems
ARC-AGI
executable world model
πŸ”Ž Similar Papers
2024-07-09Neural Information Processing SystemsCitations: 3