Synthesizing Agentic Data for Web Agents with Progressive Difficulty Enhancement Mechanisms

📅 2025-10-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing instruction-tuning datasets lack controllable difficulty and rigorous quality assurance, hindering long-horizon web reasoning; moreover, data efficacy is often conflated with training dynamics, impeding independent evaluation. Method: We propose a dual-path controllable data synthesis framework: (i) knowledge-graph-guided task generation and (ii) multi-role agent collaboration (questioning, verification, filtering) for iterative distillation—enabling fine-grained difficulty progression and factual consistency validation. Crucially, we decouple data construction from model training to enable standalone data quality assessment—a first in the web agent domain. Results: Experiments show our synthesized dataset—despite smaller scale—achieves 100% higher tool-call diversity and substantially reduces redundant API invocations. Web agents trained on it attain state-of-the-art performance across multiple benchmarks and demonstrate markedly improved long-horizon reasoning capabilities.

Technology Category

Application Category

📝 Abstract
Web-based 'deep research' agents aim to solve complex question - answering tasks through long-horizon interactions with online tools. These tasks remain challenging, as the underlying language models are often not optimized for long-horizon reasoning and exploration. Prior work has proposed workflows for constructing instruction-tuning datasets, often leveraging knowledge graphs. However, such methods typically lack fine-grained control over difficulty and quality, yielding synthetic data that falls short of capturing the complexity required for long-horizon reasoning. Furthermore, many studies conflate data and training effects by comparing models trained under different optimization recipes, making it difficult to isolate and evaluate the effectiveness of the data itself. We introduce a two-pronged data synthesis pipeline that generates question - answer pairs by progressively increasing task complexity until a frontier baseline web agent fails. The baseline agent plays multiple roles in this process: attempting the questions, validating factuality, checking for alternative answers, and enforcing filtering. To evaluate the effectiveness of our synthesis methods, we adopt a controlled training setup based on distillation from strong web agents. Experiments across multiple web-based benchmarks show that our dataset - despite being smaller - enables the training of more effective web agents than existing datasets. In particular, our data exhibits twice the diversity in tool-use actions, allowing models trained on it to achieve stronger performance while avoiding repetitive tool-calling behaviors.
Problem

Research questions and friction points this paper is trying to address.

Generating high-quality synthetic data for web agents
Controlling difficulty levels in agent training datasets
Isolating data effectiveness from training methodologies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive difficulty enhancement for data synthesis
Two-pronged pipeline generating question-answer pairs
Controlled training setup with agent distillation
🔎 Similar Papers
No similar papers found.