🤖 AI Summary
Existing ReAct-based deep research agents struggle to backtrack to earlier states, explore multiple reasoning paths, or maintain global awareness over long contexts, often becoming trapped in local optima and redundant searches. To address these limitations, this work proposes a recursive trajectory compression mechanism that, after each reasoning trajectory, generates a structured state representation summarizing accumulated evidence, uncertainties, failures, and future plans. This representation enables cross-trajectory reflection and holistic planning. The approach integrates structured state-guided conditional generation with Re-TRAC perception-aware supervised fine-tuning tailored for smaller models. Evaluated on BrowseComp, the method achieves a 15–20% performance improvement over standard ReAct, substantially reduces tool invocations and token consumption, and establishes state-of-the-art results among models of comparable scale.
📝 Abstract
LLM-based deep research agents are largely built on the ReAct framework. This linear design makes it difficult to revisit earlier states, branch into alternative search directions, or maintain global awareness under long contexts, often leading to local optima, redundant exploration, and inefficient search. We propose Re-TRAC, an agentic framework that performs cross-trajectory exploration by generating a structured state representation after each trajectory to summarize evidence, uncertainties, failures, and future plans, and conditioning subsequent trajectories on this state representation. This enables iterative reflection and globally informed planning, reframing research as a progressive process. Empirical results show that Re-TRAC consistently outperforms ReAct by 15-20% on BrowseComp with frontier LLMs. For smaller models, we introduce Re-TRAC-aware supervised fine-tuning, achieving state-of-the-art performance at comparable scales. Notably, Re-TRAC shows a monotonic reduction in tool calls and token usage across rounds, indicating progressively targeted exploration driven by cross-trajectory reflection rather than redundant search.