🤖 AI Summary
Current information retrieval systems are designed with human users in mind and struggle to accommodate search behaviors initiated by autonomous agents, leading to performance degradation and evaluation bias. To address this gap, this work proposes a systematic approach that leverages a multi-agent framework and diverse retrieval pipelines to collect agent-generated queries, retrieved documents, and reasoning traces on established benchmarks such as HotpotQA, Researchy Questions, and MS MARCO. We construct and release the first dataset specifically tailored to agentic search behavior—Agentic Search Queryset (ASQ)—alongside a supporting toolkit. This resource fills a critical void in authentic interaction data for agent-driven retrieval, enables flexible extension to new agents, retrievers, and tasks, and lays the foundation for future research in agentic information retrieval.
📝 Abstract
With automated systems increasingly issuing search queries alongside humans, Information Retrieval (IR) faces a major shift. Yet IR remains human-centred, with systems, evaluation metrics, user models, and datasets designed around human queries and behaviours. Consequently, IR operates under assumptions that no longer hold in practice, with changes to workload volumes, predictability, and querying behaviours. This misalignment affects system performance and optimisation: caching may lose effectiveness, query pre-processing may add overhead without improving results, and standard metrics may mismeasure satisfaction. Without adaptation, retrieval models risk satisfying neither humans, nor the emerging user segment of agents. However, datasets capturing agent search behaviour are lacking, which is a critical gap given IR's historical reliance on data-driven evaluation and optimisation. We develop a methodology for collecting all the data produced and consumed by agentic retrieval-augmented systems when answering queries, and we release the Agentic Search Queryset (ASQ) dataset. ASQ contains reasoning-induced queries, retrieved documents, and thoughts for queries in HotpotQA, Researchy Questions, and MS MARCO, for 3 diverse agents and 2 retrieval pipelines. The accompanying toolkit enables ASQ to be extended to new agents, retrievers, and datasets.