π€ AI Summary
This work addresses the challenges of weak out-of-distribution generalization and low training efficiency in offline training of web agents, which stem from noisy, redundant trajectories and excessively long state sequences. To tackle these issues, the authors propose Weasel, a method that formulates trajectory selection as an optimization problem balancing importance and diversity across states, websites, and interaction patterns. A greedy algorithm selects high-quality trajectory steps under a fixed budget, while expert demonstrations are replaced by target-centric AXTree pruning and model self-generated, style-consistent reasoning paths. Experimental results demonstrate that Weasel achieves 9.7β12.5Γ faster training across multiple benchmarks and large language models, while substantially improving task completion performance on out-of-distribution domains.
π Abstract
Large language models (LLMs) have enabled web agents that follow natural language goals through multi-step browser interactions. However, agents fine-tuned on specific trajectories and domain often struggle to generalize out of domain, and offline training can be compute-inefficient due to noisy, redundant trajectories and long accessibility-tree (AXTree) states. To address both issues, we propose Weasel, a trajectory selection method for offline training of web agents. Weasel selects a fixed-budget subset of trajectory steps by optimizing an objective that balances unary importance with pairwise diversity over states, websites, and interaction patterns, solving efficiently with a greedy algorithm. We further improve efficiency with target-centered AXTree pruning that keeps only content around the ground-truth action target, and we mitigate style mismatch for reasoning-native models by replacing expert traces with model-generated, style-consistent rationales. Across AgentTrek and NNetNav training datasets, evaluations in WebArena, WorkArena, and MiniWob, and experiments with Qwen2.5-7B, Gemma3-4B, and Qwen3-8B, Weasel improves out-of-domain performance while reducing training cost, producing roughly 9.7-12.5$\times$ training speedups over standard fine-tuning. We make the code available at https://github.com/fatemehpesaran310/weasel.