🤖 AI Summary
To address the reliance of diffusion models on pre-trained classifiers and labeled data for style ambiguity training, this paper proposes a novel classifier-free and label-free style ambiguity loss. Methodologically, it leverages CLIP for zero-shot semantic alignment, integrates diffusion model fine-tuning with contrastive style uncertainty modeling, and achieves— for the first time—fully unsupervised style ambiguity optimization. Experiments demonstrate significant improvements in user studies: +23.6% in perceived novelty and +18.4% in aesthetic acceptability, while automated evaluation metrics consistently outperform all baselines. The open-sourced implementation has gained broad adoption in the research community, establishing a scalable, low-dependency paradigm for creative image generation.
📝 Abstract
In this work, we explore using the style ambiguity training objective, originally used to approximate creativity, on a diffusion model. However, this objective requires the use of a pretrained classifier and a labeled dataset. We introduce new forms of style ambiguity loss that do not require training a classifier or a labeled dataset, and show that our new methods score higher both on automated metrics and user studies to analyze novelty and appreciation. Code available at https://github.com/jamesBaker361/clipcreate