🤖 AI Summary
This work addresses the high output stochasticity of Deep Research Agents (DRAs) in repeated queries, which hinders their deployment in high-stakes domains such as finance and healthcare. For the first time, the authors formalize DRAs as an information-acquisition Markov Decision Process and systematically identify sources of randomness across three stages: information acquisition, compression, and reasoning. They develop an evaluation framework to quantify output variance and propose mitigation strategies that balance reduced stochasticity with maintained output quality, leveraging controlled experiments, structured outputs, and ensemble-based query generation. Evaluated on the DeepSearchQA dataset, the proposed approach reduces average output randomness by 22% while preserving high-quality research outputs.
📝 Abstract
Deep Research Agents (DRAs) are promising agentic systems that gather and synthesize information to support research across domains such as financial decision-making, medical analysis, and scientific discovery. Despite recent improvements in research quality (e.g., outcome accuracy when ground truth is available), DRA system design often overlooks a critical barrier to real-world deployment: stochasticity. Under identical queries, repeated executions of DRAs can exhibit substantial variability in terms of research outcome, findings, and citations. In this paper, we formalize the study of stochasticity in DRAs by modeling them as information acquisition Markov Decision Processes. We introduce an evaluation framework that quantifies variance in the system and identify three sources of it: information acquisition, information compression, and inference. Through controlled experiments, we investigate how stochasticity from these modules across different decision steps influences the variance of DRA outputs. Our results show that reducing stochasticity can improve research output quality, with inference and early-stage stochasticity contributing the most to DRA output variance. Based on these findings, we propose strategies for mitigating stochasticity while maintaining output quality via structured output and ensemble-based query generation. Our experiments on DeepSearchQA show that our proposed mitigation methods reduce average stochasticity by 22% while maintaining high research quality.