DeepAgent: A General Reasoning Agent with Scalable Toolsets

📅 2025-10-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing agent frameworks rely on predefined workflows, limiting their ability to autonomously execute real-world tasks requiring external tools and extended interactive reasoning. This paper proposes an end-to-end deep reasoning agent capable of autonomous deliberation, dynamic tool discovery, and adaptive action execution. Our approach leverages large language models to simulate APIs and employs end-to-end reinforcement learning (RL) training. Key contributions include: (1) an autonomous memory folding mechanism that compresses interaction history into structured episodic, working, and tool-specific memories; and (2) ToolPO, a novel RL policy that enables fine-grained credit assignment via advantage attribution over tool invocations. Evaluated across eight general-purpose and downstream benchmarks, our method significantly outperforms state-of-the-art baselines in both annotated-tool and open-set tool retrieval settings.

Technology Category

Application Category

📝 Abstract
Large reasoning models have demonstrated strong problem-solving abilities, yet real-world tasks often require external tools and long-horizon interactions. Existing agent frameworks typically follow predefined workflows, which limit autonomous and global task completion. In this paper, we introduce DeepAgent, an end-to-end deep reasoning agent that performs autonomous thinking, tool discovery, and action execution within a single, coherent reasoning process. To address the challenges of long-horizon interactions, particularly the context length explosion from multiple tool calls and the accumulation of interaction history, we introduce an autonomous memory folding mechanism that compresses past interactions into structured episodic, working, and tool memories, reducing error accumulation while preserving critical information. To teach general-purpose tool use efficiently and stably, we develop an end-to-end reinforcement learning strategy, namely ToolPO, that leverages LLM-simulated APIs and applies tool-call advantage attribution to assign fine-grained credit to the tool invocation tokens. Extensive experiments on eight benchmarks, including general tool-use tasks (ToolBench, API-Bank, TMDB, Spotify, ToolHop) and downstream applications (ALFWorld, WebShop, GAIA, HLE), demonstrate that DeepAgent consistently outperforms baselines across both labeled-tool and open-set tool retrieval scenarios. This work takes a step toward more general and capable agents for real-world applications. The code and demo are available at https://github.com/RUC-NLPIR/DeepAgent.
Problem

Research questions and friction points this paper is trying to address.

Autonomous reasoning with scalable toolsets for real-world tasks
Compressing long interaction histories to reduce error accumulation
Efficiently teaching general-purpose tool use via reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Autonomous memory folding mechanism compresses past interactions
End-to-end reinforcement learning strategy for tool use
Single coherent reasoning process integrates thinking and execution
🔎 Similar Papers
No similar papers found.