Cybernaut: Towards Reliable Web Automation

📅 2025-08-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM-driven automation methods exhibit significant limitations in execution consistency, precise identification of critical DOM elements, and evaluability when applied to poorly designed, structurally irregular web interfaces within enterprise intranets. To address these challenges, this paper proposes: (1) a standardized operational workflow generation mechanism that reliably transforms demonstrations into robust, executable instructions; (2) a high-precision HTML element localization model integrating both semantic and structural features; and (3) a behavior-trajectory-based quantitative evaluation framework for measuring execution consistency. Evaluated on an internal benchmark, our approach improves task success rate from 72.0% to 88.68% and achieves 84.7% accuracy in operation pattern recognition. These advances substantially enhance the stability, interpretability, and assessability of AI agents in real-world industrial environments.

Technology Category

Application Category

📝 Abstract
The emergence of AI-driven web automation through Large Language Models (LLMs) offers unprecedented opportunities for optimizing digital workflows. However, deploying such systems within industry's real-world environments presents four core challenges: (1) ensuring consistent execution, (2) accurately identifying critical HTML elements, (3) meeting human-like accuracy in order to automate operations at scale and (4) the lack of comprehensive benchmarking data on internal web applications. Existing solutions are primarily tailored for well-designed, consumer-facing websites (e.g., Amazon.com, Apple.com) and fall short in addressing the complexity of poorly-designed internal web interfaces. To address these limitations, we present Cybernaut, a novel framework to ensure high execution consistency in web automation agents designed for robust enterprise use. Our contributions are threefold: (1) a Standard Operating Procedure (SOP) generator that converts user demonstrations into reliable automation instructions for linear browsing tasks, (2) a high-precision HTML DOM element recognition system tailored for the challenge of complex web interfaces, and (3) a quantitative metric to assess execution consistency. The empirical evaluation on our internal benchmark demonstrates that using our framework enables a 23.2% improvement (from 72% to 88.68%) in task execution success rate over the browser_use. Cybernaut identifies consistent execution patterns with 84.7% accuracy, enabling reliable confidence assessment and adaptive guidance during task execution in real-world systems. These results highlight Cybernaut's effectiveness in enterprise-scale web automation and lay a foundation for future advancements in web automation.
Problem

Research questions and friction points this paper is trying to address.

Ensuring consistent execution in web automation agents
Accurately identifying HTML elements in complex interfaces
Meeting human-like accuracy for enterprise-scale automation
Innovation

Methods, ideas, or system contributions that make the work stand out.

SOP generator converts user demonstrations into instructions
High-precision HTML DOM element recognition system
Quantitative metric to assess execution consistency
🔎 Similar Papers
No similar papers found.
A
Ankur Tomar
Applied AI, Amazon.com, Bellevue, WA, USA
Hengyue Liang
Hengyue Liang
Applied Scientist, Amazon
I
Indranil Bhattacharya
Applied AI, Amazon.com, Bellevue, WA, USA
N
Natalia Larios
Applied AI, Amazon.com, Bellevue, WA, USA
F
Francesco Carbone
Applied AI, Amazon.com, Bellevue, WA, USA