WebRenderBench: Enhancing Web Interface Generation through Layout-Style Consistency and Reinforcement Learning

📅 2025-10-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current UI image-to-web-code generation suffers from insufficient benchmark diversity and unreliable evaluation protocols. To address these limitations, we introduce WebRenderBench—the first large-scale benchmark comprising 225,000 real-world web pages—and propose rendering-based evaluation metrics for layout and style consistency, circumventing the noise sensitivity inherent in conventional structural comparison methods. Furthermore, we design ALISA, an automated detection agent driven by reinforcement learning, which integrates multimodal large language models, pixel-level rendering engine comparison, layout parsing, and an RL optimization framework. Experimental results demonstrate that ALISA achieves significant improvements across multiple metrics, establishing new state-of-the-art performance. These findings validate both the representativeness of WebRenderBench and the robustness of our rendering-centric evaluation paradigm.

Technology Category

Application Category

📝 Abstract
Automating the conversion of UI images into web code is a critical task for front-end development and rapid prototyping. Advances in multimodal large language models (MLLMs) have made WebUI-to-Code increasingly feasible, yet existing benchmarks remain limited in data diversity and evaluation reliability. To address these issues, we present WebRenderBench, a large-scale benchmark of 22.5k webpages collected from real-world portal sites, offering greater diversity, complexity, and realism than prior benchmarks. We further propose a novel evaluation metric that measures layout and style consistency from the final rendered pages. Unlike vision-based methods that rely on costly LLM reasoning or structure-based comparisons vulnerable to noise and asymmetry, our approach enables more efficient, objective, and reliable UI quality assessment. Finally, we introduce the Automated Layout and Style Inspection Agent (ALISA), which integrates this metric into reinforcement learning as a reward signal to enhance training on crawled asymmetric webpages. Experiments show that ALISA significantly boosts generation performance, achieving state-of-the-art results across multiple metrics.
Problem

Research questions and friction points this paper is trying to address.

Automating UI image to web code conversion
Addressing limited diversity in existing benchmarks
Improving evaluation reliability for rendered webpages
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale real-world webpage benchmark for training
Layout-style consistency metric for rendered page evaluation
Reinforcement learning agent integrating metric as reward
P
Peichao Lai
Peking University, Beijing, China
J
Jinhui Zhuang
Xiamen Huaxia University, Xiamen, Fujian, China
K
Kexuan Zhang
Fuzhou University, Fuzhou, Fujian, China
N
Ningchang Xiong
City University of Hong Kong, Hong Kong SAR, China
Shengjie Wang
Shengjie Wang
Tsinghua University
RoboticsReinforcement learningBionic robotics
Y
Yanwei Xu
Peking University, Beijing, China
C
Chong Chen
Huawei Cloud BU, Beijing, China
Yilei Wang
Yilei Wang
Alibaba Cloud
B
Bin Cui
Peking University, Beijing, China