🤖 AI Summary
For long-horizon, deep information retrieval and autonomous research tasks, this paper introduces an agent-oriented large language model (Agent LLM) designed to realize end-to-end reasoning, proactive information acquisition, and continual planning. Methodologically, we propose a two-stage training paradigm—“agent-in-the-loop pretraining + post-training”—integrated with a fully automated, scalable data synthesis pipeline that requires no human annotation, and employ a sparse-activation architecture (30.5B total parameters, 3.3B activated per token). Our contributions include: (1) the first end-to-end training framework specifically tailored for deep research; (2) open-sourcing the model weights and a comprehensive toolchain supporting research agent development; and (3) state-of-the-art performance on benchmarks including Humanity’s Last Exam and BrowseComp.
📝 Abstract
We present Tongyi DeepResearch, an agentic large language model, which is specifically designed for long-horizon, deep information-seeking research tasks. To incentivize autonomous deep research agency, Tongyi DeepResearch is developed through an end-to-end training framework that combines agentic mid-training and agentic post-training, enabling scalable reasoning and information seeking across complex tasks. We design a highly scalable data synthesis pipeline that is fully automatic, without relying on costly human annotation, and empowers all training stages. By constructing customized environments for each stage, our system enables stable and consistent interactions throughout. Tongyi DeepResearch, featuring 30.5 billion total parameters, with only 3.3 billion activated per token, achieves state-of-the-art performance across a range of agentic deep research benchmarks, including Humanity's Last Exam, BrowseComp, BrowseComp-ZH, WebWalkerQA, xbench-DeepSearch, FRAMES and xbench-DeepSearch-2510. We open-source the model, framework, and complete solutions to empower the community.