DualSpec: Accelerating Deep Research Agents via Dual-Process Action Speculation

πŸ“… 2026-03-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the high latency incurred by large language model–driven deep research agents in long-horizon information retrieval tasks, primarily due to frequent tool invocations and complex reasoning. To mitigate this, the authors propose DualSpec, a framework that leverages the heterogeneity between Search and Visit actions in terms of reasoning demands and model capabilities. DualSpec introduces a dual-process speculative mechanism that efficiently overlaps reasoning and execution to reduce latency. By analyzing entropy to identify differences in action uncertainty, the framework employs a lightweight, confidence-based semantic verifier to enable differentiated speculative execution for heterogeneous actions. Experimental results demonstrate that DualSpec achieves up to 3.28Γ— end-to-end speedup across multiple models and benchmarks while maintaining accuracy comparable to that of full-reasoning agents.

Technology Category

Application Category

πŸ“ Abstract
Large language model-based deep research agents have been increasingly popular for addressing long-horizon information-seeking tasks, but they often incur high end-to-end latency due to extensive reasoning and frequent tool use. Speculation frameworks aim to reduce latency by overlapping action execution with reasoning; however, existing approaches typically rely on uniform speculation strategies and strict action matching, which limits inference speedups and robustness. In this work, we revisit the speculate-verify paradigm for deep research agents through the lens of action heterogeneity. We show that \textit{Search} and \textit{Visit} actions exhibit fundamentally different reasoning and model capacity requirements: entropy-based analysis reveals that Search decisions have higher uncertainty and benefit significantly from explicit reasoning, whereas Visit decisions have lower entropy and depend primarily on model capacity. Motivated by this dual-process characteristic, we propose DualSpec, a heterogeneous speculation framework equipped with a lightweight, confidence-based semantic verifier. Experiments across multiple models and benchmarks demonstrate that DualSpec achieves up to 3.28$\times$ end-to-end speedup while maintaining accuracy comparable to fully reasoning agents.
Problem

Research questions and friction points this paper is trying to address.

deep research agents
action speculation
latency reduction
heterogeneous actions
speculate-verify paradigm
Innovation

Methods, ideas, or system contributions that make the work stand out.

dual-process speculation
action heterogeneity
semantic verification
deep research agents
latency reduction