Tongyi DeepResearch Technical Report

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

248K/year

🤖 AI Summary

For long-horizon, deep information retrieval and autonomous research tasks, this paper introduces an agent-oriented large language model (Agent LLM) designed to realize end-to-end reasoning, proactive information acquisition, and continual planning. Methodologically, we propose a two-stage training paradigm—“agent-in-the-loop pretraining + post-training”—integrated with a fully automated, scalable data synthesis pipeline that requires no human annotation, and employ a sparse-activation architecture (30.5B total parameters, 3.3B activated per token). Our contributions include: (1) the first end-to-end training framework specifically tailored for deep research; (2) open-sourcing the model weights and a comprehensive toolchain supporting research agent development; and (3) state-of-the-art performance on benchmarks including Humanity’s Last Exam and BrowseComp.

Technology Category

Application Category

📝 Abstract

We present Tongyi DeepResearch, an agentic large language model, which is specifically designed for long-horizon, deep information-seeking research tasks. To incentivize autonomous deep research agency, Tongyi DeepResearch is developed through an end-to-end training framework that combines agentic mid-training and agentic post-training, enabling scalable reasoning and information seeking across complex tasks. We design a highly scalable data synthesis pipeline that is fully automatic, without relying on costly human annotation, and empowers all training stages. By constructing customized environments for each stage, our system enables stable and consistent interactions throughout. Tongyi DeepResearch, featuring 30.5 billion total parameters, with only 3.3 billion activated per token, achieves state-of-the-art performance across a range of agentic deep research benchmarks, including Humanity's Last Exam, BrowseComp, BrowseComp-ZH, WebWalkerQA, xbench-DeepSearch, FRAMES and xbench-DeepSearch-2510. We open-source the model, framework, and complete solutions to empower the community.

Problem

Research questions and friction points this paper is trying to address.

Developing an agentic LLM for long-horizon deep research tasks

Creating scalable autonomous reasoning across complex information-seeking scenarios

Achieving state-of-the-art performance on agentic deep research benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end training framework with agentic stages

Fully automatic scalable data synthesis pipeline

Customized environments for stable agent interactions

🔎 Similar Papers

DeepDiveAI: Identifying AI Related Documents in Large Scale Literature Data

2024-08-23Citations: 1

Nvidia

30 USD - 94 USD

US, CA, Santa Clara

Research Scientist, AI Language