O-Researcher: An Open Ended Deep Research Model via Multi-Agent Distillation and Agentic RL

📅 2026-01-07

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

252K/year

🤖 AI Summary

This work addresses the performance gap between open-source and closed-source large language models, which stems largely from the scarcity of high-quality training data for the former. To bridge this gap, the authors propose an end-to-end multi-agent distillation framework that leverages collaborative agents to simulate complex tool-integrated reasoning processes, thereby automatically generating research-grade instruction data. The framework employs a two-stage training strategy combining supervised fine-tuning and agent-oriented reinforcement learning, without relying on any proprietary datasets. This approach significantly enhances the reasoning capabilities of open-source models on deep research tasks, achieving new state-of-the-art results among open-source models across multiple established benchmarks and effectively narrowing the performance disparity with closed-source counterparts.

Technology Category

Application Category

📝 Abstract

The performance gap between closed-source and open-source large language models (LLMs) is largely attributed to disparities in access to high-quality training data. To bridge this gap, we introduce a novel framework for the automated synthesis of sophisticated, research-grade instructional data. Our approach centers on a multi-agent workflow where collaborative AI agents simulate complex tool-integrated reasoning to generate diverse and high-fidelity data end-to-end. Leveraging this synthesized data, we develop a two-stage training strategy that integrates supervised fine-tuning with a novel reinforcement learning method, designed to maximize model alignment and capability. Extensive experiments demonstrate that our framework empowers open-source models across multiple scales, enabling them to achieve new state-of-the-art performance on the major deep research benchmark. This work provides a scalable and effective pathway for advancing open-source LLMs without relying on proprietary data or models.

Problem

Research questions and friction points this paper is trying to address.

large language models

open-source

training data

performance gap

instructional data

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent distillation

agentic reinforcement learning

synthetic instructional data