Code Aesthetics with Agentic Reward Feedback

📅 2025-10-27

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Large language models (LLMs) exhibit poor code aesthetic quality in vision-guided programming tasks. Method: This paper proposes an end-to-end code aesthetics optimization framework: (1) constructing AesCode-358K, the first large-scale code aesthetics instruction-tuning dataset; (2) designing an Agentic Reward Feedback evaluation system that jointly models executability, static layout, and interactive aesthetics; (3) introducing GRPO-AR, a reinforcement learning algorithm that integrates multi-dimensional reward signals for end-to-end optimization; and (4) releasing OpenDesign, an open-source evaluation benchmark, alongside the lightweight model AesCoder-4B. Contribution/Results: Experiments demonstrate significant performance gains over GPT-4o/4.1 on both OpenDesign and PandasPlotBench, with results competitive with state-of-the-art open-source LLMs of 480B–685B parameters.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have become valuable assistants for developers in code-related tasks. While LLMs excel at traditional programming tasks such as code generation and bug fixing, they struggle with visually-oriented coding tasks, often producing suboptimal aesthetics. In this paper, we introduce a new pipeline to enhance the aesthetic quality of LLM-generated code. We first construct AesCode-358K, a large-scale instruction-tuning dataset focused on code aesthetics. Next, we propose agentic reward feedback, a multi-agent system that evaluates executability, static aesthetics, and interactive aesthetics. Building on this, we develop GRPO-AR, which integrates these signals into the GRPO algorithm for joint optimization of functionality and code aesthetics. Finally, we develop OpenDesign, a benchmark for assessing code aesthetics. Experimental results show that combining supervised fine-tuning on AesCode-358K with reinforcement learning using agentic reward feedback significantly improves performance on OpenDesign and also enhances results on existing benchmarks such as PandasPlotBench. Notably, our AesCoder-4B surpasses GPT-4o and GPT-4.1, and achieves performance comparable to large open-source models with 480B-685B parameters, underscoring the effectiveness of our approach.

Problem

Research questions and friction points this paper is trying to address.

Improving aesthetic quality of LLM-generated code

Addressing suboptimal aesthetics in visual coding tasks

Joint optimization of functionality and code aesthetics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Constructed large-scale dataset AesCode-358K for tuning

Proposed multi-agent system for aesthetic evaluation feedback

Developed GRPO-AR algorithm for joint optimization

🔎 Similar Papers

No similar papers found.