GenColorBench: A Color Evaluation Benchmark for Text-to-Image Generation Models

📅 2025-10-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current text-to-image models exhibit significant deficiencies in fine-grained color control, while mainstream evaluation benchmarks lack systematic assessment of RGB numerical understanding, compatibility with standardized color systems, and alignment with human perceptual consistency. Method: We introduce the first comprehensive benchmark for color controllability, covering 400+ colors and 44K structured prompts, integrating multi-level naming schemes (e.g., ISCC-NBS, CSS3/X11) with precise RGB specifications. We propose a perception–automation dual-dimensional evaluation framework that jointly quantifies color semantic comprehension, numerical fidelity, and visual consistency. Contribution/Results: Our empirical analysis reveals pronounced performance gaps and failure modes across color categories in state-of-the-art models. The benchmark establishes a reproducible, rigorously validated standard for evaluating color generation accuracy and provides empirically grounded directions for model improvement.

Technology Category

Application Category

📝 Abstract
Recent years have seen impressive advances in text-to-image generation, with image generative or unified models producing high-quality images from text. Yet these models still struggle with fine-grained color controllability, often failing to accurately match colors specified in text prompts. While existing benchmarks evaluate compositional reasoning and prompt adherence, none systematically assess color precision. Color is fundamental to human visual perception and communication, critical for applications from art to design workflows requiring brand consistency. However, current benchmarks either neglect color or rely on coarse assessments, missing key capabilities such as interpreting RGB values or aligning with human expectations. To this end, we propose GenColorBench, the first comprehensive benchmark for text-to-image color generation, grounded in color systems like ISCC-NBS and CSS3/X11, including numerical colors which are absent elsewhere. With 44K color-focused prompts covering 400+ colors, it reveals models' true capabilities via perceptual and automated assessments. Evaluations of popular text-to-image models using GenColorBench show performance variations, highlighting which color conventions models understand best and identifying failure modes. Our GenColorBench assessments will guide improvements in precise color generation. The benchmark will be made public upon acceptance.
Problem

Research questions and friction points this paper is trying to address.

Evaluating color precision in text-to-image generation models
Assessing RGB value interpretation and human expectation alignment
Identifying failure modes in fine-grained color controllability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark evaluates color precision in text-to-image models
Uses ISCC-NBS and CSS3 color systems with numerical values
Assesses 400+ colors through perceptual and automated evaluations
🔎 Similar Papers
No similar papers found.
Muhammad Atif Butt
Muhammad Atif Butt
Ph.D. Candidate, Computer Vision Center, Universitat Autònoma de Barcelona
Computer VisionGenerative AIAutonomous DrivingAdversarial ML
Alexandra Gomez-Villa
Alexandra Gomez-Villa
Assistant Professor, Universitat Autònoma de Barcelona & Researcher, Computer Vision Center
Computer visionMachine learningVisual perception
T
Tao Wu
Computer Vision Center, Spain; Computer Sciences Department, Universitat Autònoma de Barcelona, Spain
J
Javier Vazquez-Corral
Computer Vision Center, Spain; Computer Sciences Department, Universitat Autònoma de Barcelona, Spain
J
Joost Van De Weijer
Computer Vision Center, Spain; Computer Sciences Department, Universitat Autònoma de Barcelona, Spain
K
Kai Wang
Computer Vision Center, Spain; Program of Computer Science, City University of Hong Kong (Dongguan); City University of Hong Kong