π€ AI Summary
This work addresses the prevalent issue of oversaturated colors and exaggerated contrast in text-to-image (T2I) generation, which undermines visual realism due to biased evaluation metrics. The study presents the first systematic formulation and quantification of βcolor fidelity,β introducing a large-scale ordered dataset, CFD, comprising 1.3 million real and synthetic images. It proposes the Color Fidelity Metric (CFM), a multimodal perception-based encoder that significantly outperforms existing metrics in assessing color authenticity. Furthermore, the authors develop a training-free Color Fidelity Refinement (CFR) method that dynamically modulates spatial-temporal guidance strength to effectively mitigate over-saturation across multiple T2I models. This approach enhances perceptual realism and establishes a closed-loop framework for both evaluating and improving color fidelity in synthetic imagery.
π Abstract
Recent advances in text-to-image (T2I) generation have greatly improved visual quality, yet producing images that appear visually authentic to real-world photography remains challenging. This is partly due to biases in existing evaluation paradigms: human ratings and preference-trained metrics often favor visually vivid images with exaggerated saturation and contrast, which make generations often too vivid to be real even when prompted for realistic-style images. To address this issue, we present Color Fidelity Dataset (CFD) and Color Fidelity Metric (CFM) for objective evaluation of color fidelity in realistic-style generations. CFD contains over 1.3M real and synthetic images with ordered levels of color realism, while CFM employs a multimodal encoder to learn perceptual color fidelity. In addition, we propose a training-free Color Fidelity Refinement (CFR) that adaptively modulates spatial-temporal guidance scale in generation, thereby enhancing color authenticity. Together, CFD supports CFM for assessment, whose learned attention further guides CFR to refine T2I fidelity, forming a progressive framework for assessing and improving color fidelity in realistic-style T2I generation. The dataset and code are available at https://github.com/ZhengyaoFang/CFM.