🤖 AI Summary
Existing design synthesis methods predominantly model the task as single-step generation, overlooking the incremental and hierarchical nature of creative processes. To address this, we propose a novel “stepwise hierarchical design generation” paradigm, framing design as a progressive state-update process guided by sequential instructions. We introduce SLEDGE, the first model explicitly designed for stepwise hierarchical generation: built upon multimodal large language models, it integrates a hierarchical state-update mechanism with fine-grained instruction alignment to enable atomic-level iterative refinement of design elements. We construct a new benchmark and dataset to rigorously evaluate this paradigm. Experimental results demonstrate that SLEDGE significantly outperforms existing state-of-the-art methods across fidelity, controllability, and interpretability—thereby validating the effectiveness and advantages of our stepwise hierarchical approach to design synthesis.
📝 Abstract
Design generation, in its essence, is a step-by-step process where designers progressively refine and enhance their work through careful modifications. Despite this fundamental characteristic, existing approaches mainly treat design synthesis as a single-step generation problem, significantly underestimating the inherent complexity of the creative process. To bridge this gap, we propose a novel problem setting called Step-by-Step Layered Design Generation, which tasks a machine learning model with generating a design that adheres to a sequence of instructions from a designer. Leveraging recent advancements in multi-modal LLMs, we propose SLEDGE: Step-by-step LayEred Design GEnerator to model each update to a design as an atomic, layered change over its previous state, while being grounded in the instruction. To complement our new problem setting, we introduce a new evaluation suite, including a dataset and a benchmark. Our exhaustive experimental analysis and comparison with state-of-the-art approaches tailored to our new setup demonstrate the efficacy of our approach. We hope our work will attract attention to this pragmatic and under-explored research area.