🤖 AI Summary
This study systematically investigates whether tool use consistently enhances the performance of web-based agents. Through large-scale, rigorously controlled experiments conducted under a unified and comparable setting, the work comprehensively evaluates the impact of varying tool sources, prominent large language models, tool integration frameworks, and diverse web-task benchmarks. The findings reveal that tool utilization does not universally improve performance; its benefits are highly contingent on specific design choices. The research uncovers several critical practical principles and previously underappreciated negative effects, challenging certain established assumptions and offering a more reliable empirical foundation for the effective deployment of tools in intelligent agents.
📝 Abstract
As web agents rapidly evolve, an increasing body of work has moved beyond conventional atomic browser interactions and explored tool use as a higher-level action paradigm. Although prior studies have shown the promise of tools, their conclusions are often drawn from limited experimental scales and sometimes non-comparable settings. As a result, several fundamental questions remain unclear: i) whether tools provide consistent gains for web agents, ii) what practical design principles characterize effective tools, and iii) what side effects tool use may introduce. To establish a stronger empirical foundation for future research, we revisit tool use in web agents through an extensive and carefully controlled study across diverse tool sources, backbone models, tool-use frameworks, and evaluation benchmarks. Our findings both revise some prior conclusions and complement others with broader evidence. We hope this study provides a more reliable empirical basis and inspires future research on tool-use web agents.