WebWalker: Benchmarking LLMs in Web Traversal

📅 2025-01-13

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Large language models (LLMs) struggle to deeply comprehend and integrate information scattered across web pages. Method: This paper introduces WebWalker—a novel exploration-critique dual-role multi-agent framework enabling human-like multi-hop web navigation. It incorporates RAG-enhanced vertical and horizontal cross-page retrieval, dynamic URL path planning, and content verification mechanisms. Additionally, we propose WebWalkerQA, the first benchmark explicitly designed for evaluating structured web traversal. Contribution/Results: Experiments demonstrate that WebWalker significantly improves complex question answering accuracy on WebWalkerQA, achieving a 37.2% absolute gain over baseline methods on multi-hop reasoning tasks. To our knowledge, this is the first systematic validation of LLMs’ capability to acquire deep, contextualized web information through active, goal-directed navigation—demonstrating both effectiveness and scalability in real-world web interaction scenarios.

Technology Category

Application Category

📝 Abstract

Retrieval-augmented generation (RAG) demonstrates remarkable performance across tasks in open-domain question-answering. However, traditional search engines may retrieve shallow content, limiting the ability of LLMs to handle complex, multi-layered information. To address it, we introduce WebWalkerQA, a benchmark designed to assess the ability of LLMs to perform web traversal. It evaluates the capacity of LLMs to traverse a website's subpages to extract high-quality data systematically. We propose WebWalker, which is a multi-agent framework that mimics human-like web navigation through an explore-critic paradigm. Extensive experimental results show that WebWalkerQA is challenging and demonstrates the effectiveness of RAG combined with WebWalker, through the horizontal and vertical integration in real-world scenarios.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Web Browsing Simulation

Information Integration

Innovation

Methods, ideas, or system contributions that make the work stand out.

WebWalkerQA

RAG technology

web information retrieval

🔎 Similar Papers

No similar papers found.