O-Researcher: An Open Ended Deep Research Model via Multi-Agent Distillation and Agentic RL

📅 2026-01-07
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance gap between open-source and closed-source large language models, which stems largely from the scarcity of high-quality training data for the former. To bridge this gap, the authors propose an end-to-end multi-agent distillation framework that leverages collaborative agents to simulate complex tool-integrated reasoning processes, thereby automatically generating research-grade instruction data. The framework employs a two-stage training strategy combining supervised fine-tuning and agent-oriented reinforcement learning, without relying on any proprietary datasets. This approach significantly enhances the reasoning capabilities of open-source models on deep research tasks, achieving new state-of-the-art results among open-source models across multiple established benchmarks and effectively narrowing the performance disparity with closed-source counterparts.

Technology Category

Application Category

📝 Abstract
The performance gap between closed-source and open-source large language models (LLMs) is largely attributed to disparities in access to high-quality training data. To bridge this gap, we introduce a novel framework for the automated synthesis of sophisticated, research-grade instructional data. Our approach centers on a multi-agent workflow where collaborative AI agents simulate complex tool-integrated reasoning to generate diverse and high-fidelity data end-to-end. Leveraging this synthesized data, we develop a two-stage training strategy that integrates supervised fine-tuning with a novel reinforcement learning method, designed to maximize model alignment and capability. Extensive experiments demonstrate that our framework empowers open-source models across multiple scales, enabling them to achieve new state-of-the-art performance on the major deep research benchmark. This work provides a scalable and effective pathway for advancing open-source LLMs without relying on proprietary data or models.
Problem

Research questions and friction points this paper is trying to address.

large language models
open-source
training data
performance gap
instructional data
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent distillation
agentic reinforcement learning
synthetic instructional data
open-source LLMs
tool-integrated reasoning