Improving User Interface Generation Models from Designer Feedback

📅 2025-09-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current large language models (LLMs) struggle to generate high-quality user interfaces (UIs), primarily because conventional reward modeling and ranking-based RLHF approaches are misaligned with real-world design workflows and lack grounding in design theory. To address this, we propose a design-practice-oriented, feedback-driven training paradigm that integrates rich, rationale-laden multimodal feedback—elicited from designer comments, sketches, and direct interactive edits—and construct a high-quality dataset of 1,500 professionally annotated samples. By fine-tuning LLMs with this data and incorporating a multimodal interactive feedback mechanism, our model achieves substantial improvements in UI generation fidelity. Human evaluation demonstrates that our method outperforms all baselines—including GPT-5—across design rationale, usability, and aesthetics. To our knowledge, this is the first work to achieve deep alignment between RLHF and professional design cognition processes.

Technology Category

Application Category

📝 Abstract
Despite being trained on vast amounts of data, most LLMs are unable to reliably generate well-designed UIs. Designer feedback is essential to improving performance on UI generation; however, we find that existing RLHF methods based on ratings or rankings are not well-aligned with designers' workflows and ignore the rich rationale used to critique and improve UI designs. In this paper, we investigate several approaches for designers to give feedback to UI generation models, using familiar interactions such as commenting, sketching and direct manipulation. We first perform a study with 21 designers where they gave feedback using these interactions, which resulted in ~1500 design annotations. We then use this data to finetune a series of LLMs to generate higher quality UIs. Finally, we evaluate these models with human judges, and we find that our designer-aligned approaches outperform models trained with traditional ranking feedback and all tested baselines, including GPT-5.
Problem

Research questions and friction points this paper is trying to address.

LLMs cannot reliably generate well-designed user interfaces
Existing RLHF methods ignore designers' workflow and rationale
Designer feedback methods need alignment with familiar interaction patterns
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning LLMs with designer feedback interactions
Using commenting, sketching, and direct manipulation inputs
Training models with rich rationale from design annotations
🔎 Similar Papers
No similar papers found.