🤖 AI Summary
This work addresses a critical security vulnerability in browser agents operating within dynamic web environments, where the temporal gap between planning and execution can lead to actions based on stale page states—manifesting as time-of-check-to-time-of-use (TOCTOU) flaws. The study presents the first systematic characterization and formalization of this issue, introducing a large-scale benchmark comprising both synthetic and real-world websites to empirically evaluate ten prominent open-source browser agents. To mitigate the risk, the authors propose a lightweight mechanism that jointly validates DOM structure and layout state immediately before action execution, thereby enforcing atomicity. Experimental results demonstrate that this approach substantially reduces erroneous interactions caused by dynamic page changes, significantly enhancing the safety and reliability of browser automation tasks.
📝 Abstract
Browser-use agents are widely used for everyday tasks. They enable automated interaction with web pages through structured DOM based interfaces or vision language models operating on page screenshots. However, web pages often change between planning and execution, causing agents to execute actions based on stale assumptions. We view this temporal mismatch as a time of check to time of use (TOCTOU) vulnerability in browser-use agents. Dynamic or adversarial web content can exploit this window to induce unintended actions. We present a large scale empirical study of TOCTOU vulnerabilities in browser-use agents using a benchmark that spans synthesized and real world websites. Using this benchmark, we evaluate 10 popular open source agents and show that TOCTOU vulnerabilities are widespread. We design a lightweight mitigation based on pre-execution validation. It monitors DOM and layout changes during planning and validates the page state immediately before action execution. This approach reduces the risk of insecure execution and mitigates unintended side effects in browser-use agents.