🤖 AI Summary
This paper introduces the novel task of “code-to-style image generation,” aiming to synthesize high-fidelity, visually coherent, and controllable stylized images from a single numeric code—without relying on textual prompts, reference images, or model fine-tuning. To this end, we propose CoTyle, the first open-source framework for numeric style control: it employs a discrete style codebook and an autoregressive style generator to map numeric codes into structured, semantically grounded style representations; these are then injected as conditional signals into a text-to-image diffusion model via a dedicated style-conditioning mechanism. Trained on large-scale stylistic data, the codebook ensures both high style diversity and strong reproducibility. Extensive experiments demonstrate that CoTyle achieves state-of-the-art performance in style consistency, controllability, and creative expressiveness under a “one-code-one-style” paradigm. It is the first method to enable concise, reproducible, and high-precision numerical control over visual style in diffusion-based image synthesis.
📝 Abstract
Innovative visual stylization is a cornerstone of artistic creation, yet generating novel and consistent visual styles remains a significant challenge. Existing generative approaches typically rely on lengthy textual prompts, reference images, or parameter-efficient fine-tuning to guide style-aware image generation, but often struggle with style consistency, limited creativity, and complex style representations. In this paper, we affirm that a style is worth one numerical code by introducing the novel task, code-to-style image generation, which produces images with novel, consistent visual styles conditioned solely on a numerical style code. To date, this field has only been primarily explored by the industry (e.g., Midjourney), with no open-source research from the academic community. To fill this gap, we propose CoTyle, the first open-source method for this task. Specifically, we first train a discrete style codebook from a collection of images to extract style embeddings. These embeddings serve as conditions for a text-to-image diffusion model (T2I-DM) to generate stylistic images. Subsequently, we train an autoregressive style generator on the discrete style embeddings to model their distribution, allowing the synthesis of novel style embeddings. During inference, a numerical style code is mapped to a unique style embedding by the style generator, and this embedding guides the T2I-DM to generate images in the corresponding style. Unlike existing methods, our method offers unparalleled simplicity and diversity, unlocking a vast space of reproducible styles from minimal input. Extensive experiments validate that CoTyle effectively turns a numerical code into a style controller, demonstrating a style is worth one code.