WorkForceAgent-R1: Incentivizing Reasoning Capability in LLM-based Web Agents via Reinforcement Learning

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited reasoning capability, poor generalization, and low robustness of existing supervised fine-tuning (SFT)-based web agents in dynamic interactive environments, this paper proposes an enterprise-grade LLM-powered web agent framework. Our core method introduces the first R1-style regularized reinforcement learning paradigm: it implicitly acquires both single-step reasoning and multi-step planning capabilities without requiring human-annotated reasoning traces. The approach integrates a structured reward function, output-format compliance constraints, and an action correctness evaluation mechanism. Evaluated on the WorkArena benchmark, our method outperforms SFT baselines by 10.26–16.59% and achieves performance on par with state-of-the-art proprietary agents such as GPT-4o, demonstrating its effectiveness and practicality in complex, dynamic web environments.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs)-empowered web agents enables automating complex, real-time web navigation tasks in enterprise environments. However, existing web agents relying on supervised fine-tuning (SFT) often struggle with generalization and robustness due to insufficient reasoning capabilities when handling the inherently dynamic nature of web interactions. In this study, we introduce WorkForceAgent-R1, an LLM-based web agent trained using a rule-based R1-style reinforcement learning framework designed explicitly to enhance single-step reasoning and planning for business-oriented web navigation tasks. We employ a structured reward function that evaluates both adherence to output formats and correctness of actions, enabling WorkForceAgent-R1 to implicitly learn robust intermediate reasoning without explicit annotations or extensive expert demonstrations. Extensive experiments on the WorkArena benchmark demonstrate that WorkForceAgent-R1 substantially outperforms SFT baselines by 10.26-16.59%, achieving competitive performance relative to proprietary LLM-based agents (gpt-4o) in workplace-oriented web navigation tasks.
Problem

Research questions and friction points this paper is trying to address.

Enhancing reasoning in LLM web agents for dynamic tasks
Improving generalization in business web navigation via RL
Boosting performance without expert annotations or demonstrations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses rule-based R1-style reinforcement learning
Enhances single-step reasoning and planning
Employs structured reward function evaluation
🔎 Similar Papers