🤖 AI Summary
Existing approaches to graphic design generation struggle to simultaneously achieve high visual fidelity and fine-grained structural editability, often producing either non-editable raster images or abstract layouts lacking visual detail. This work reframes design as a programmatic synthesis task grounded in HTML/CSS and introduces a Plan-Implement-Reflect framework: a semantic planner constructs a dynamic hierarchical element structure, while a vision-aware feedback mechanism iteratively refines the generated code to correct rendering discrepancies. By representing designs natively in code, the method inherently supports advanced capabilities such as automatic layout reflow, complex document generation, and CSS animations. Experiments demonstrate that the proposed approach significantly outperforms state-of-the-art methods in both structural validity and aesthetic quality, achieving, for the first time, a unified balance between high-fidelity visual output and full structural editability.
📝 Abstract
Graphic design generation demands a delicate balance between high visual fidelity and fine-grained structural editability. However, existing approaches typically bifurcate into either non-editable raster image synthesis or abstract layout generation devoid of visual content. Recent combinations of these two approaches attempt to bridge this gap but often suffer from rigid composition schemas and unresolvable visual dissonances (e.g., text-background conflicts) due to their inexpressive representation and open-loop nature. To address these challenges, we propose DesignAsCode, a novel framework that reimagines graphic design as a programmatic synthesis task using HTML/CSS. Specifically, we introduce a Plan-Implement-Reflect pipeline, incorporating a Semantic Planner to construct dynamic, variable-depth element hierarchies and a Visual-Aware Reflection mechanism that iteratively optimizes the code to rectify rendering artifacts. Extensive experiments demonstrate that DesignAsCode significantly outperforms state-of-the-art baselines in both structural validity and aesthetic quality. Furthermore, our code-native representation unlocks advanced capabilities, including automatic layout retargeting, complex document generation (e.g., resumes), and CSS-based animation.