🤖 AI Summary
Conventional creativity evaluation treats creativity as a single scalar value, neglecting its inherently multidimensional nature. Method: We propose CREward, the first type-specific creativity reward model for images, quantifying creativity along three interpretable dimensions—geometry, material, and texture. CREward is trained via supervised multi-task learning on a human-perception benchmark augmented with pseudo-labels from large vision-language models (VLMs). It is then integrated as a reward signal to guide LoRA-based fine-tuning for creative image generation. Contribution/Results: CREward significantly outperforms baselines in cross-type creativity assessment, enables fine-grained attribution analysis, and improves both diversity and human preference scores of generated images. To our knowledge, this is the first work to achieve type-decoupled creativity modeling and end-to-end optimization of creative image generation, establishing a new paradigm for controllable creative synthesis.
📝 Abstract
Creativity is a complex phenomenon. When it comes to representing and assessing creativity, treating it as a single undifferentiated quantity would appear naive and underwhelming. In this work, we learn the emph{first type-specific creativity reward model}, coined CREward, which spans three creativity ``axes," geometry, material, and texture, to allow us to view creativity through the lens of the image formation pipeline. To build our reward model, we first conduct a human benchmark evaluation to capture human perception of creativity for each type across various creative images. We then analyze the correlation between human judgments and predictions by large vision-language models (LVLMs), confirming that LVLMs exhibit strong alignment with human perception. Building on this observation, we collect LVLM-generated labels to train our CREward model that is applicable to both evaluation and generation of creative images. We explore three applications of CREward: creativity assessment, explainable creativity, and creative sample acquisition for both human design inspiration and guiding creative generation through low-rank adaptation.