Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents

📅 2025-08-03

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This paper addresses the limited cognitive reasoning capability of web agents by proposing Web-CogKnowledge, a knowledge-driven two-stage framework. In Stage I, it models the web environment through structured knowledge representation—distinguishing factual, conceptual, and procedural knowledge. In Stage II, it performs chain-of-thought reasoning and action planning grounded in this knowledge. To support evaluation, the authors introduce Web-CogDataset—the first benchmark dataset explicitly designed for cognitive reasoning on the web—and Web-CogBench, a comprehensive evaluation benchmark. The framework integrates multimodal large language models, knowledge-augmented chain-of-thought reasoning, and procedural-knowledge-guided action exploration. Experiments demonstrate that Web-CogKnowledge significantly outperforms existing baselines on unseen tasks, particularly excelling in complex reasoning tasks requiring structured knowledge support, thereby exhibiting superior generalization and robustness.

Technology Category

Application Category

📝 Abstract

Multimodal large-scale models have significantly advanced the development of web agents, enabling perception and interaction with digital environments akin to human cognition. In this paper, we argue that web agents must first acquire sufficient knowledge to effectively engage in cognitive reasoning. Therefore, we decompose a web agent's capabilities into two essential stages: knowledge content learning and cognitive processes. To formalize this, we propose Web-CogKnowledge Framework, categorizing knowledge as Factual, Conceptual, and Procedural. In this framework, knowledge content learning corresponds to the agent's processes of Memorizing and Understanding, which rely on the first two knowledge types, representing the "what" of learning. Conversely, cognitive processes correspond to Exploring, grounded in Procedural knowledge, defining the "how" of reasoning and action. To facilitate knowledge acquisition, we construct the Web-CogDataset, a structured resource curated from 14 real-world websites, designed to systematically instill core knowledge necessary for web agent. This dataset serves as the agent's conceptual grounding-the "nouns" upon which comprehension is built-as well as the basis for learning how to reason and act. Building on this foundation, we operationalize these processes through a novel knowledge-driven Chain-of-Thought (CoT) reasoning framework, developing and training our proposed agent, the Web-CogReasoner. Extensive experimentation reveals its significant superiority over existing models, especially in generalizing to unseen tasks where structured knowledge is decisive. To enable rigorous evaluation, we introduce the Web-CogBench, a comprehensive evaluation suite designed to assess and compare agent performance across the delineated knowledge domains and cognitive capabilities. Our code and data is open sourced at https://github.com/Gnonymous/Web-CogReasoner

Problem

Research questions and friction points this paper is trying to address.

Enhancing web agents with cognitive reasoning through knowledge acquisition

Structuring knowledge into Factual, Conceptual, and Procedural types for learning

Developing a knowledge-driven Chain-of-Thought framework for web agent reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Web-CogKnowledge Framework categorizes Factual, Conceptual, Procedural knowledge

Web-CogDataset provides structured knowledge from 14 real-world websites

Knowledge-driven Chain-of-Thought reasoning enhances cognitive processes

🔎 Similar Papers

A Survey on Large Language Model based Autonomous Agents