ManuSearch: Democratizing Deep Search in Large Language Models with a Transparent and Open Multi-Agent Framework

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

To address the opacity, irreproducibility, and lack of democratization in closed-source LLM-based web-augmented systems, this paper introduces WebRAG—an open-source multi-agent framework featuring a novel “Plan-Search-Structured-Read” tri-agent collaborative architecture that enables large language models to perform real-time, deep, and interpretable open-web reasoning. Methodologically, WebRAG integrates iterative subquery generation, robust webpage structure-aware extraction, and dynamic retrieval scheduling. Our contributions are threefold: (1) We release ORION, the first bilingual (Chinese–English) benchmark for open-web reasoning over long-tail entities; (2) WebRAG significantly outperforms leading open-source baselines on ORION and—critically—achieves the first reported superiority over state-of-the-art closed-source systems on this task; (3) All code, data, and evaluation tools are fully open-sourced to foster transparent, reproducible research in web-augmented reasoning.

Technology Category

Application Category

📝 Abstract

Recent advances in web-augmented large language models (LLMs) have exhibited strong performance in complex reasoning tasks, yet these capabilities are mostly locked in proprietary systems with opaque architectures. In this work, we propose extbf{ManuSearch}, a transparent and modular multi-agent framework designed to democratize deep search for LLMs. ManuSearch decomposes the search and reasoning process into three collaborative agents: (1) a solution planning agent that iteratively formulates sub-queries, (2) an Internet search agent that retrieves relevant documents via real-time web search, and (3) a structured webpage reading agent that extracts key evidence from raw web content. To rigorously evaluate deep reasoning abilities, we introduce extbf{ORION}, a challenging benchmark focused on open-web reasoning over long-tail entities, covering both English and Chinese. Experimental results show that ManuSearch substantially outperforms prior open-source baselines and even surpasses leading closed-source systems. Our work paves the way for reproducible, extensible research in open deep search systems. We release the data and code in https://github.com/RUCAIBox/ManuSearch

Problem

Research questions and friction points this paper is trying to address.

Democratizing deep search in LLMs with transparent multi-agent framework

Decomposing search and reasoning into collaborative specialized agents

Introducing benchmark for evaluating open-web reasoning over long-tail entities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular multi-agent framework for deep search

Real-time web search integration for LLMs

Structured webpage reading for evidence extraction

🔎 Similar Papers

AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML