NumColor: Precise Numeric Color Control in Text-to-Image Generation

📅 2026-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of precise numerical color control in text-to-image diffusion models, which struggle to interpret hexadecimal or RGB color codes due to subword tokenization that fragments these codes into semantically meaningless units. To overcome this limitation, the authors propose the first framework enabling zero-shot cross-model numerical color control. Their approach introduces a Color Token Aggregator to recognize color codes in arbitrary formats and maps them into a learnable ColorBook embedding within the perceptually uniform CIE Lab color space. Directional alignment and interpolation consistency losses are incorporated to ensure geometric fidelity between the learned embeddings and the underlying color space. Evaluated on a synthetically generated dataset, NumColor-Data, the method demonstrates strong compatibility across diverse diffusion architectures, achieving 4–9× improvements in color accuracy and 10–30× gains in color harmony scores on GenColorBench across five mainstream models.

Technology Category

Application Category

📝 Abstract
Text-to-image diffusion models excel at generating images from natural language descriptions, yet fail to interpret numerical colors such as hex codes (#FF5733) and RGB values (rgb(255,87,51)). This limitation stems from subword tokenization, which fragments color codes into semantically meaningless tokens that text encoders cannot map to coherent color representations. We present NumColor, that enables precise numerical color control across multiple diffusion architectures. NumColor comprises two components: a Color Token Aggregator that detects color specifications regardless of tokenization, and a ColorBook containing 6,707 learnable embeddings that map colors to embedding space of text encoder in perceptually uniform CIE Lab space. We introduce two auxiliary losses, directional alignment and interpolation consistency, to enforce geometric correspondence between Lab and embedding spaces, enabling smooth color interpolation. To train the ColorBook, we construct NumColor-Data, a synthetic dataset of 500K rendered images with unambiguous color-to-pixel correspondence, eliminating the annotation ambiguity inherent in photographic datasets. Although trained solely on FLUX, NumColor transfers zero-shot to SD3, SD3.5, PixArt-α, and PixArt-Σ without model-specific adaptation. NumColor improves numerical color accuracy by 4-9x across five models, while simultaneously improving color harmony scores by 10-30x on GenColorBench benchmark.
Problem

Research questions and friction points this paper is trying to address.

text-to-image generation
numeric color control
diffusion models
color representation
subword tokenization
Innovation

Methods, ideas, or system contributions that make the work stand out.

numerical color control
diffusion models
color embedding
zero-shot transfer
CIE Lab space
🔎 Similar Papers
No similar papers found.
Muhammad Atif Butt
Muhammad Atif Butt
Ph.D. Candidate, Computer Vision Center, Universitat Autònoma de Barcelona
Computer VisionGenerative AIAutonomous DrivingAdversarial ML
D
Diego Hernandez
Computer Vision Center, Spain; Computer Sciences Department, Universitat Autònoma de Barcelona, Spain
Alexandra Gomez-Villa
Alexandra Gomez-Villa
Assistant Professor, Universitat Autònoma de Barcelona & Researcher, Computer Vision Center
Computer visionMachine learningVisual perception
K
Kai Wang
Program of Computer Science, City University of Hong Kong (Dongguan); Computer Vision Center, Spain; Computer Sciences Department, Universitat Autònoma de Barcelona, Spain; City University of Hong Kong
J
Javier Vazquez-Corral
Computer Vision Center, Spain; Computer Sciences Department, Universitat Autònoma de Barcelona, Spain
J
Joost Van De Weijer
Computer Vision Center, Spain; Computer Sciences Department, Universitat Autònoma de Barcelona, Spain