A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space

📅 2025-11-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper introduces the novel task of “code-to-style image generation,” aiming to synthesize high-fidelity, visually coherent, and controllable stylized images from a single numeric code—without relying on textual prompts, reference images, or model fine-tuning. To this end, we propose CoTyle, the first open-source framework for numeric style control: it employs a discrete style codebook and an autoregressive style generator to map numeric codes into structured, semantically grounded style representations; these are then injected as conditional signals into a text-to-image diffusion model via a dedicated style-conditioning mechanism. Trained on large-scale stylistic data, the codebook ensures both high style diversity and strong reproducibility. Extensive experiments demonstrate that CoTyle achieves state-of-the-art performance in style consistency, controllability, and creative expressiveness under a “one-code-one-style” paradigm. It is the first method to enable concise, reproducible, and high-precision numerical control over visual style in diffusion-based image synthesis.

Technology Category

Application Category

📝 Abstract
Innovative visual stylization is a cornerstone of artistic creation, yet generating novel and consistent visual styles remains a significant challenge. Existing generative approaches typically rely on lengthy textual prompts, reference images, or parameter-efficient fine-tuning to guide style-aware image generation, but often struggle with style consistency, limited creativity, and complex style representations. In this paper, we affirm that a style is worth one numerical code by introducing the novel task, code-to-style image generation, which produces images with novel, consistent visual styles conditioned solely on a numerical style code. To date, this field has only been primarily explored by the industry (e.g., Midjourney), with no open-source research from the academic community. To fill this gap, we propose CoTyle, the first open-source method for this task. Specifically, we first train a discrete style codebook from a collection of images to extract style embeddings. These embeddings serve as conditions for a text-to-image diffusion model (T2I-DM) to generate stylistic images. Subsequently, we train an autoregressive style generator on the discrete style embeddings to model their distribution, allowing the synthesis of novel style embeddings. During inference, a numerical style code is mapped to a unique style embedding by the style generator, and this embedding guides the T2I-DM to generate images in the corresponding style. Unlike existing methods, our method offers unparalleled simplicity and diversity, unlocking a vast space of reproducible styles from minimal input. Extensive experiments validate that CoTyle effectively turns a numerical code into a style controller, demonstrating a style is worth one code.
Problem

Research questions and friction points this paper is trying to address.

Generating novel consistent visual styles without complex inputs
Overcoming style inconsistency in existing image generation methods
Creating reproducible styles from minimal numerical code input
Innovation

Methods, ideas, or system contributions that make the work stand out.

Discrete style codebook extracts visual style embeddings
Autoregressive generator models novel style embedding distribution
Numerical style code guides diffusion model for generation
🔎 Similar Papers
No similar papers found.
H
Huijie Liu
Beihang University
Shuhao Cui
Shuhao Cui
Kolors Team, Kuaishou Technology
H
Haoxiang Cao
South China Normal University
S
Shuai Ma
Beihang University
K
Kai Wu
Kolors Team, Kuaishou Technology
Guoliang Kang
Guoliang Kang
Professor, Beihang University
Deep learning and its applications