Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach

📅 2024-06-24

🏛️ arXiv.org

📈 Citations: 10

✨ Influential: 0

career value

177K/year

🤖 AI Summary

To address common issues in manual design-to-code translation—including element omission, geometric distortion, and incorrect ordering—this paper proposes DCGen, the first framework enabling end-to-end translation of webpage screenshots into executable UI code. Methodologically, DCGen employs a segment-aware multimodal large language model (MLLM), integrating image tiling with vision-unit-level localization to decompose the task and generate modular code fragments, which are then semantically reassembled into complete, functional UIs. Additionally, it introduces a dual-dimensional similarity evaluation mechanism assessing both visual and code-level fidelity. Evaluated on a real-world website dataset, DCGen achieves a 15% improvement in visual similarity and an 8% gain in code similarity over prior methods. Human evaluation confirms significantly enhanced developer productivity and superior UI fidelity.

Technology Category

Application Category

📝 Abstract

Websites are critical in today's digital world, with over 1.11 billion currently active and approximately 252,000 new sites launched daily. Converting website layout design into functional UI code is a time-consuming yet indispensable step of website development. Manual methods of converting visual designs into functional code present significant challenges, especially for non-experts. To explore automatic design-to-code solutions, we first conduct a motivating study on GPT-4o and identify three types of issues in generating UI code: element omission, element distortion, and element misarrangement. We further reveal that a focus on smaller visual segments can help multimodal large language models (MLLMs) mitigate these failures in the generation process. In this paper, we propose DCGen, a divide-and-conquer-based approach to automate the translation of webpage design to UI code. DCGen starts by dividing screenshots into manageable segments, generating code for each segment, and then reassembling them into complete UI code for the entire screenshot. We conduct extensive testing with a dataset comprised of real-world websites and various MLLMs and demonstrate that DCGen achieves up to a 15% improvement in visual similarity and 8% in code similarity for large input images. Human evaluations show that DCGen can help developers implement webpages significantly faster and more similar to the UI designs. To the best of our knowledge, DCGen is the first segment-aware MLLM-based approach for generating UI code directly from screenshots.

Problem

Research questions and friction points this paper is trying to address.

Automating conversion of website designs to UI code

Addressing element omission, distortion, and misarrangement in UI generation

Improving accuracy and efficiency in screenshot-to-code translation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Divide-and-conquer approach for UI code generation

Segment-aware MLLM for accurate element handling

Automated reassembly of segments into complete UI

🔎 Similar Papers

VISION2UI: A Real-World Dataset with Layout for Code Generation from UI Designs