🤖 AI Summary
Direct LLM-based generation of complete, multi-file UI code faces challenges including complex prompt engineering, verbose and unmaintainable outputs, and poor structural controllability. To address these, we propose a scaffolded code generation framework grounded in an intermediate representation (IR), which decomposes UI development into three structured IR artifacts: application storyboards, data models, and GUI skeletons—enabling iterative, human-LLM co-construction. Our IR-driven approach explicitly models navigation flows and component dependencies to guide multi-file code synthesis, thereby enhancing output interpretability, maintainability, and collaborative efficiency. A user study demonstrates that 75% of participants preferred our system over conventional chat-based baselines; it significantly reduced error rates and improved development speed in prototype-building tasks.
📝 Abstract
It is challenging to generate the code for a complete user interface using a Large Language Model (LLM). User interfaces are complex and their implementations often consist of multiple, inter-related files that together specify the contents of each screen, the navigation flows between the screens, and the data model used throughout the application. It is challenging to craft a single prompt for an LLM that contains enough detail to generate a complete user interface, and even then the result is frequently a single large and difficult to understand file that contains all of the generated screens. In this paper, we introduce Athena, a prototype application generation environment that demonstrates how the use of shared intermediate representations, including an app storyboard, data model, and GUI skeletons, can help a developer work with an LLM in an iterative fashion to craft a complete user interface. These intermediate representations also scaffold the LLM's code generation process, producing organized and structured code in multiple files while limiting errors. We evaluated Athena with a user study that found 75% of participants preferred our prototype over a typical chatbot-style baseline for prototyping apps.