An Open and Reproducible Deep Research Agent for Long-Form Question Answering

📅 2025-12-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses open-domain long-form question answering (LFQA) by proposing a reproducible deep-research agent framework. Methodologically, it integrates open-source large language models (LLMs) with web search APIs to construct an iterative retrieval–reasoning–synthesis pipeline and introduces the first LLM-as-a-judge-based multi-dimensional preference tuning mechanism—explicitly optimizing for clarity, insightfulness, and factual accuracy—to enable end-to-end optimization. Contributions include: (1) the first open-source system supporting multi-turn, real-world deep research; (2) an interpretable and scalable paradigm for automated multi-dimensional evaluation and alignment; and (3) state-of-the-art performance on the NeurIPS 2025 MMU-RAG text-to-text track, achieving systematic improvements across all three core evaluation metrics. All code and experimental configurations are publicly released.

Technology Category

Application Category

📝 Abstract
We present an open deep research system for long-form question answering, selected as a winning system in the text-to-text track of the MMU-RAG competition at NeurIPS 2025. The system combines an open-source large language model (LLM) with an open web search API to perform iterative retrieval, reasoning, and synthesis in real-world open-domain settings. To enhance reasoning quality, we apply preference tuning based on LLM-as-a-judge feedback that evaluates multiple aspects, including clarity, insightfulness, and factuality. Our experimental results show that the proposed method consistently improves answer quality across all three aspects. Our source code is publicly available at https://github.com/efficient-deep-research/efficient-deep-research.
Problem

Research questions and friction points this paper is trying to address.

Develops an open deep research system for long-form question answering
Combines LLM with web search for iterative retrieval and synthesis
Applies preference tuning to enhance clarity, insightfulness, and factuality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source LLM with web search API integration
Preference tuning using LLM-as-a-judge feedback
Iterative retrieval, reasoning, and synthesis process
🔎 Similar Papers
No similar papers found.