WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation

📅 2025-11-09

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Existing UI-to-Code approaches generate only static HTML/CSS/JS, lacking support for interactive logic. To address this limitation, we propose WebVIA-Agent—the first agent-based framework for interactive UI-to-Code generation. It integrates a vision-language model (VLM), a multi-state interface exploration agent, stage-wise prompt engineering, fine-tuned UI2Code models, and a browser-based interactivity validation mechanism. This enables end-to-end generation of executable, interactive web code with automated verification. Experiments demonstrate that WebVIA-Agent achieves superior stability and accuracy in interface exploration compared to general-purpose agents. Moreover, the fine-tuned WebVIA-UI2Code model significantly outperforms state-of-the-art baselines on both static and interactive benchmarks, markedly improving the quality, reliability, and practical utility of UI-to-Code translation.

Technology Category

Application Category

📝 Abstract

User interface (UI) development requires translating design mockups into functional code, a process that remains repetitive and labor-intensive. While recent Vision-Language Models (VLMs) automate UI-to-Code generation, they generate only static HTML/CSS/JavaScript layouts lacking interactivity. To address this, we propose WebVIA, the first agentic framework for interactive UI-to-Code generation and validation. The framework comprises three components: 1) an exploration agent to capture multi-state UI screenshots; 2) a UI2Code model that generates executable interactive code; 3) a validation module that verifies the interactivity. Experiments demonstrate that WebVIA-Agent achieves more stable and accurate UI exploration than general-purpose agents (e.g., Gemini-2.5-Pro). In addition, our fine-tuned WebVIA-UI2Code models exhibit substantial improvements in generating executable and interactive HTML/CSS/JavaScript code, outperforming their base counterparts across both interactive and static UI2Code benchmarks. Our code and models are available at href{https://zheny2751-dotcom.github.io/webvia.github.io/}{ exttt{https://webvia.github.io}}.

Problem

Research questions and friction points this paper is trying to address.

Automating UI-to-code generation for interactive web interfaces

Addressing limitations of static layouts from vision-language models

Developing verifiable interactive code through agentic exploration and validation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic framework for interactive UI-to-code generation

Multi-state UI exploration agent captures screenshots

Validation module verifies generated code interactivity

🔎 Similar Papers

Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach

2024-06-24arXiv.orgCitations: 10

Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering

2024-03-05Citations: 0

Waymo

$204,000—$259,000 USD

Mountain View, CA, USA / San Francisco, CA, USA / Mountain View (US-MTV-EMF680), Mountain View, California, United States

Authors to Follow