TRACE: Capability-Targeted Agentic Training

📅 2026-04-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language model agents struggle to selectively enhance the specific capabilities required for particular tasks. This work proposes the first end-to-end self-improvement framework that automatically identifies capability gaps by contrasting successful and failed execution trajectories, synthesizes capability-oriented training environments based on these insights, and trains lightweight LoRA adapters via reinforcement learning. During inference, the system dynamically routes to the most relevant adapter to augment performance. Evaluated on the τ²-bench, the method achieves a 14.1-point improvement and secures seven additional perfect scores in ToolSandbox, substantially outperforming the strongest baseline while demonstrating higher training efficiency under the same number of rollouts.
📝 Abstract
Large Language Models (LLMs) deployed in agentic environments must exercise multiple capabilities across different task instances, where a capability is performing one or more actions in a trajectory that are necessary for successfully solving a subset of tasks in the environment. Many existing approaches either rely on synthetic training data that is not targeted to the model's actual capability deficits in the target environment or train directly on the target environment, where the model needs to implicitly learn the capabilities across tasks. We introduce TRACE (Turning Recurrent Agent failures into Capability-targeted training Environments), an end-to-end system for environment-specific agent self-improvement. TRACE contrasts successful and failed trajectories to automatically identify lacking capabilities, synthesizes a targeted training environment for each that rewards whether the capability was exercised, and trains a LoRA adapter via RL on each synthetic environment, routing to the relevant adapter at inference. Empirically, TRACE generalizes across different environments, improving over the base agent by +14.1 points on $τ^2$-bench (customer service) and +7 perfect scores on ToolSandbox (tool use), outperforming the strongest baseline by +7.4 points and +4 perfect scores, respectively. Given the same number of rollouts, TRACE scales more efficiently than baselines, outperforming GRPO and GEPA by +9.2 and +7.4 points on $τ^2$-bench.
Problem

Research questions and friction points this paper is trying to address.

agentic environments
capability deficits
targeted training
trajectory analysis
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

capability-targeted training
trajectory contrast
synthetic training environment
LoRA adapter
reinforcement learning
🔎 Similar Papers
No similar papers found.