🤖 AI Summary
Web agents (e.g., Operator, Project Mariner) extend the capabilities of large language models (LLMs) to interactive web environments, yet their energy consumption and carbon footprint—critical sustainability concerns—remain unassessed systematically. This work presents the first comprehensive study of energy usage and carbon emissions in web agents, integrating theoretical modeling with empirical benchmarking across diverse architectural choices—including action granularity, planning depth, and tool invocation strategies. We find that higher energy consumption does not consistently improve task performance, and current systems suffer from opaque parameters and execution workflows, hindering accurate energy estimation. Accordingly, we propose treating energy consumption as a primary evaluation metric and advocate for a new sustainability-oriented evaluation framework. We emphasize auditability of both models and execution traces to enable transparent, reproducible energy accounting. Our methodology and recommendations provide foundational guidance for developing green, energy-efficient web agents.
📝 Abstract
Web agents, like OpenAI's Operator and Google's Project Mariner, are powerful agentic systems pushing the boundaries of Large Language Models (LLM). They can autonomously interact with the internet at the user's behest, such as navigating websites, filling search masks, and comparing price lists. Though web agent research is thriving, induced sustainability issues remain largely unexplored. To highlight the urgency of this issue, we provide an initial exploration of the energy and $CO_2$ cost associated with web agents from both a theoretical -via estimation- and an empirical perspective -by benchmarking. Our results show how different philosophies in web agent creation can severely impact the associated expended energy, and that more energy consumed does not necessarily equate to better results. We highlight a lack of transparency regarding disclosing model parameters and processes used for some web agents as a limiting factor when estimating energy consumption. Our work contributes towards a change in thinking of how we evaluate web agents, advocating for dedicated metrics measuring energy consumption in benchmarks.