🤖 AI Summary
Current web search systems struggle to simultaneously support deep reasoning and broad structured aggregation, often falling short in cross-entity consistency, coverage breadth, and long-horizon inference. This work proposes a two-tier multi-agent framework wherein a high-level coordinator dynamically decomposes tasks and low-level worker agents execute them in parallel, jointly optimizing performance through a closed-loop execution–verification–reflection mechanism. The approach innovatively integrates persistent external memory with agent self-evolution capabilities, unifying depth and breadth in search. Empirical results demonstrate substantial gains: on WideSearch, it achieves an Avg@4 of 38.50—7.5 times higher than the previous best—along with Row F1 and Item F1 scores of 63.53 and 80.12, respectively; on XBench-DeepSearch, it attains an accuracy of 73.0%.
📝 Abstract
Agentic web search increasingly faces two distinct demands: deep reasoning over a single target, and structured aggregation across many entities and heterogeneous sources. Current systems struggle on both fronts. Breadth-oriented tasks demand schema-aligned outputs with wide coverage and cross-entity consistency, while depth-oriented tasks require coherent reasoning over long, branching search trajectories. We introduce \textbf{Web2BigTable}, a multi-agent framework for web-to-table search that supports both regimes. Web2BigTable adopts a bi-level architecture in which an upper-level orchestrator decomposes the task into sub-problems and lower-level worker agents solve them in parallel. Through a closed-loop run--verify--reflect process, the framework jointly improves decomposition and execution over time via persistent, human-readable external memory, with self-evolving updates to each single-agent. During execution, workers coordinate through a shared workspace that makes partial findings visible, allowing them to reduce redundant exploration, reconcile conflicting evidence, and adapt to emerging coverage gaps. Web2BigTable sets a new state of the art on WideSearch, reaching an Avg@4 Success Rate of \textbf{38.50} ($7.5\times$ the second best at 5.10), Row F1 of \textbf{63.53} (+25.03 over the second best), and Item F1 of \textbf{80.12} (+14.42 over the second best). It also generalises to depth-oriented search on XBench-DeepSearch, achieving 73.0 accuracy. Code is available at https://github.com/web2bigtable/web2bigtable.