ACC: Compiling Agent Trajectories for Long-Context Training

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Current large language models lack effective supervision for integrating dispersed evidence from tool responses and environmental observations across multiple interaction rounds, limiting their ability to model long-range dependencies. This work proposes Agent Context Compilation (ACC), which, for the first time, compiles multi-turn agent trajectories—including tool invocations and environmental feedback—into end-to-end long-context question-answer pairs, thereby constructing explicit supervision signals without requiring additional annotations. ACC addresses a critical gap in standard supervised fine-tuning regarding cross-turn evidence utilization and is compatible with existing long-context extension techniques. Evaluated on Qwen3-30B-A3B, ACC achieves 68.3 (+18.1) on MRCR and 77.5 (+7.6) on GraphWalks, matching the performance of the significantly larger Qwen3-235B-A22B model while preserving competitive results on general benchmarks such as GPQA and MMLU-Pro.

📝 Abstract

Recent development of agents has renewed demand for long-context reasoning capacity of LLMs. However, training LLMs for this capacity requires costly long-document curation or heuristic context synthesis. We observe that agents produce massive trajectories when solving problems, invoking tools and receiving environment observations across many turns. The evidence needed to answer the original question is thus scattered throughout these turns, requiring integration of distant context segments. Nevertheless, standard agent SFT masks tool responses and only trains turn-level tool selection, creating a supervision blind spot where these scattered signals go unused. We propose Agent Context Compilation (ACC), which converts trajectories from search, software engineering, and database querying agents into long-context QA pairs that combine the original question with tool responses and environment observations gathered across multiple turns, training the model to answer directly without tool use. This makes the dependencies between the question and the evidence explicit, enabling direct supervision of long-context reasoning over distant segments without additional annotation. ACC is a simple but effective approach that can be combined with any existing long-context extension or training method, providing scalable supervised fine-tuning data. We validate ACC on long-range dependency modeling tasks through MRCR and GraphWalks, challenging benchmarks requiring cross-turn coreference resolution and graph traversal over extended contexts. Training Qwen3-30B-A3B with ACC achieves 68.3 on MRCR (+18.1) and 77.5 on GraphWalks (+7.6), results comparable to Qwen3-235B-A22B, while preserving general capabilities on GPQA, MMLU-Pro, AIME, and IFEval. Further mechanism analysis reveals that the ACC-trained model exhibits task-adaptive attention restructuring and expert specialization.

Problem

Research questions and friction points this paper is trying to address.

long-context reasoning

agent trajectories

supervised fine-tuning

distant context integration

tool-augmented LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agent Context Compilation

long-context reasoning

trajectory compilation