EigentSearch-Q+: Enhancing Deep Research Agents with Structured Reasoning Tools

📅 2026-04-09

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work addresses the challenges of redundant exploration and fragile evidence aggregation in deep research agents performing open-domain question answering, which stem from implicit and unstructured search behaviors. To mitigate these issues, the authors propose Q+, a novel toolkit that integrates structured reasoning mechanisms into a multi-agent browser subsystem for the first time. Q+ explicitly models query planning, search progress monitoring, and evidence extraction from long web pages, combining Anthropic’s “think” paradigm with information retrieval techniques within the open-source Eigent framework. Experimental results across four benchmarks demonstrate consistent performance gains: when using GPT-4.1, GPT-5.1, and Minimax M2.5, Q+ improves weighted average accuracy by 3.0, 3.8, and 0.6 percentage points, respectively, significantly enhancing the coherence and controllability of agent-based reasoning.

Technology Category

Application Category

📝 Abstract

Deep research requires reasoning over web evidence to answer open-ended questions, and it is a core capability for AI agents. Yet many deep research agents still rely on implicit, unstructured search behavior that causes redundant exploration and brittle evidence aggregation. Motivated by Anthropic's "think" tool paradigm and insights from the information-retrieval literature, we introduce Q+, a set of query and evidence processing tools that make web search more deliberate by guiding query planning, monitoring search progress, and extracting evidence from long web snapshots. We integrate Q+ into the browser sub-agent of Eigent, an open-source, production-ready multi-agent workforce for computer use, yielding EigentSearch-Q+. Across four benchmarks (SimpleQA-Verified, FRAMES, WebWalkerQA, and X-Bench DeepSearch), Q+ improves Eigent's browser agent benchmark-size-weighted average accuracy by 3.0, 3.8, and 0.6 percentage points (pp) for GPT-4.1, GPT-5.1, and Minimax M2.5 model backends, respectively. Case studies further suggest that EigentSearch-Q+ produces more coherent tool-calling trajectories by making search progress and evidence handling explicit.

Problem

Research questions and friction points this paper is trying to address.

deep research

structured reasoning

web search

evidence aggregation

AI agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

structured reasoning

deep research agents

query planning