WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation

πŸ“… 2025-11-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing UI-to-Code approaches generate only static HTML/CSS/JS, lacking support for interactive logic. To address this limitation, we propose WebVIA-Agentβ€”the first agent-based framework for interactive UI-to-Code generation. It integrates a vision-language model (VLM), a multi-state interface exploration agent, stage-wise prompt engineering, fine-tuned UI2Code models, and a browser-based interactivity validation mechanism. This enables end-to-end generation of executable, interactive web code with automated verification. Experiments demonstrate that WebVIA-Agent achieves superior stability and accuracy in interface exploration compared to general-purpose agents. Moreover, the fine-tuned WebVIA-UI2Code model significantly outperforms state-of-the-art baselines on both static and interactive benchmarks, markedly improving the quality, reliability, and practical utility of UI-to-Code translation.

Technology Category

Application Category

πŸ“ Abstract
User interface (UI) development requires translating design mockups into functional code, a process that remains repetitive and labor-intensive. While recent Vision-Language Models (VLMs) automate UI-to-Code generation, they generate only static HTML/CSS/JavaScript layouts lacking interactivity. To address this, we propose WebVIA, the first agentic framework for interactive UI-to-Code generation and validation. The framework comprises three components: 1) an exploration agent to capture multi-state UI screenshots; 2) a UI2Code model that generates executable interactive code; 3) a validation module that verifies the interactivity. Experiments demonstrate that WebVIA-Agent achieves more stable and accurate UI exploration than general-purpose agents (e.g., Gemini-2.5-Pro). In addition, our fine-tuned WebVIA-UI2Code models exhibit substantial improvements in generating executable and interactive HTML/CSS/JavaScript code, outperforming their base counterparts across both interactive and static UI2Code benchmarks. Our code and models are available at href{https://zheny2751-dotcom.github.io/webvia.github.io/}{ exttt{https://webvia.github.io}}.
Problem

Research questions and friction points this paper is trying to address.

Automating UI-to-code generation for interactive web interfaces
Addressing limitations of static layouts from vision-language models
Developing verifiable interactive code through agentic exploration and validation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic framework for interactive UI-to-code generation
Multi-state UI exploration agent captures screenshots
Validation module verifies generated code interactivity
πŸ”Ž Similar Papers
No similar papers found.
M
Mingde Xu
Faculty of Mathematics, University of Waterloo
Z
Zhen Yang
The Knowledge Engineering Group (KEG), Tsinghua University
Wenyi Hong
Wenyi Hong
Tsinghua University
multimodal pretraining
L
Lihang Pan
Zhipu AI
X
Xinyue Fan
Zhipu AI
Y
Yan Wang
Zhipu AI
Xiaotao Gu
Xiaotao Gu
Zhipu AI
Language ModelingGenerative ModelsData Mining
B
Bin Xu
The Knowledge Engineering Group (KEG), Tsinghua University
Jie Tang
Jie Tang
UW Madison
Computed Tomography