Widget2Code: From Visual Widgets to UI Code via Multimodal LLMs

📅 2025-12-22

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This work addresses the long-overlooked widget-level UI2Code problem—generating executable frontend code from a single, unannotated widget image under severe constraints: no semantic markup, extreme spatial limitation, and minimal contextual cues. To this end, the authors formally introduce the Widget2Code task and propose the first end-to-end framework that produces visually faithful, layout-compact, and executable code. They contribute (1) WidgetBench, the first purely image-based benchmark dataset for widgets; (2) WidgetDSL, a lightweight domain-specific language with a cross-framework compiler supporting React, HTML, and CSS; and (3) a novel architecture integrating icon retrieval, visual module reuse, and adaptive spatial rendering. Extensive experiments demonstrate significant improvements over state-of-the-art baselines across fine-grained, multi-dimensional metrics. This is the first method to achieve high-fidelity, executable widget-level UI-to-code generation, thereby bridging a critical gap in UI2Code research for micro-interfaces.

Technology Category

Application Category

📝 Abstract

User interface to code (UI2Code) aims to generate executable code that can faithfully reconstruct a given input UI. Prior work focuses largely on web pages and mobile screens, leaving app widgets underexplored. Unlike web or mobile UIs with rich hierarchical context, widgets are compact, context-free micro-interfaces that summarize key information through dense layouts and iconography under strict spatial constraints. Moreover, while (image, code) pairs are widely available for web or mobile UIs, widget designs are proprietary and lack accessible markup. We formalize this setting as the Widget-to-Code (Widget2Code) and introduce an image-only widget benchmark with fine-grained, multi-dimensional evaluation metrics. Benchmarking shows that although generalized multimodal large language models (MLLMs) outperform specialized UI2Code methods, they still produce unreliable and visually inconsistent code. To address these limitations, we develop a baseline that jointly advances perceptual understanding and structured code generation. At the perceptual level, we follow widget design principles to assemble atomic components into complete layouts, equipped with icon retrieval and reusable visualization modules. At the system level, we design an end-to-end infrastructure, WidgetFactory, which includes a framework-agnostic widget-tailored domain-specific language (WidgetDSL) and a compiler that translates it into multiple front-end implementations (e.g., React, HTML/CSS). An adaptive rendering module further refines spatial dimensions to satisfy compactness constraints. Together, these contributions substantially enhance visual fidelity, establishing a strong baseline and unified infrastructure for future Widget2Code research.

Problem

Research questions and friction points this paper is trying to address.

Generates executable code from compact, context-free widget images lacking accessible markup.

Addresses unreliable and visually inconsistent code output by multimodal large language models.

Enhances visual fidelity and structured code generation for widget-to-code translation.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal LLMs for widget visual understanding

WidgetDSL framework-agnostic domain-specific language

Adaptive rendering for compact spatial constraints

🔎 Similar Papers

Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach