WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Current large language models struggle to generate multi-page websites that are both functionally correct and visually appealing, due to limitations such as single-page output, high inference costs, and the absence of effective reward mechanisms. This work proposes an end-to-end reinforcement learning framework that integrates scaffold-guided structured code generation with a novel cascaded multimodal reward mechanism—combining structural constraints, functional execution feedback, and visual aesthetic evaluation—to jointly optimize functionality and aesthetics. Using a compact 7B-parameter LLM, the method produces deployable, stylistically consistent multi-page websites, achieving functional success rates comparable to the 671B-parameter DeepSeek-R1 model and significantly outperforming all existing open-source models (up to 72B parameters) in terms of rendering validity and aesthetic alignment.

Technology Category

Application Category

📝 Abstract

While Large Language Models (LLMs) excel at function-level code generation, project-level tasks such as generating functional and visually aesthetic multi-page websites remain highly challenging. Existing works are often limited to single-page static websites, while agentic frameworks typically rely on multi-turn execution with proprietary models, leading to substantial token costs, high latency, and brittle integration. Training a small LLM end-to-end with reinforcement learning (RL) is a promising alternative, yet it faces a critical bottleneck in designing reliable and computationally feasible rewards for website generation. Unlike single-file coding tasks that can be verified by unit tests, website generation requires evaluating inherently subjective aesthetics, cross-page interactions, and functional correctness. To this end, we propose WebGen-R1, an end-to-end RL framework tailored for project-level website generation. We first introduce a scaffold-driven structured generation paradigm that constrains the large open-ended action space and preserves architectural integrity. We then design a novel cascaded multimodal reward that seamlessly couples structural guarantees with execution-grounded functional feedback and vision-based aesthetic supervision. Extensive experiments demonstrate that our WebGen-R1 substantially transforms a 7B base model from generating nearly nonfunctional websites into producing deployable, aesthetically aligned multi-page websites. Remarkably, our WebGen-R1 not only consistently outperforms heavily scaled open-source models (up to 72B), but also rivals the state-of-the-art DeepSeek-R1 (671B) in functional success, while substantially exceeding it in valid rendering and aesthetic alignment. These results position WebGen-R1 as a viable path for scaling small open models from function-level code generation to project-level web application generation.

Problem

Research questions and friction points this paper is trying to address.

website generation

functional correctness

aesthetic evaluation

project-level code generation

multimodal reward

Innovation

Methods, ideas, or system contributions that make the work stand out.

reinforcement learning

website generation

multimodal reward