π€ AI Summary
This work addresses the challenge that reasoning-driven queries in deep-search agents often misalign with web indexing structures, yielding retrieval results that are either too coarse or overly specific to support precise evidence extraction. To bridge this gap, the authors propose WeDas, a framework that integrates the structural distribution of web content into the agentβs observation space. WeDas dynamically evaluates the compatibility between query intent and retrieval results through a query-result alignment score and employs a few-shot probing mechanism toζη₯ local content distributions without requiring full index access, enabling plug-and-play retrieval optimization. The approach supports dynamic subgoal calibration, effectively linking high-level reasoning with low-level retrieval. Evaluated on four benchmarks, WeDas significantly improves both subgoal completion rates and answer accuracy, thereby narrowing the divide between advanced reasoning and basic retrieval.
π Abstract
Despite the integration of search tools, Deep Search Agents often suffer from a misalignment between reasoning-driven queries and the underlying web indexing structures. Existing frameworks treat the search engine as a static utility, leading to queries that are either too coarse or too granular to retrieve precise evidence. We propose WeDas, a Web Content Distribution Aware framework that incorporates search-space structural characteristics into the agent's observation space. Central to our method is the Query-Result Alignment Score, a metric quantifying the compatibility between agent intent and retrieval outcomes. To overcome the intractability of indexing the dynamic web, we introduce a few-shot probing mechanism that iteratively estimates this score via limited query accesses, allowing the agent to dynamically recalibrate sub-goals based on the local content landscape. As a plug-and-play module, WeDas consistently improves sub-goal completion and accuracy across four benchmarks, effectively bridging the gap between high-level reasoning and low-level retrieval.