🤖 AI Summary
Current text-driven image editing methods suffer from two major bottlenecks in color control: low color accuracy and poor interpolation continuity—specifically, difficulty in precisely constraining output colors within a target RGB range, and absence of an interpretable mapping between interpolation coefficients and actual RGB changes. To address these limitations, we propose a diffusion-based fine-grained color-controllable editing method. Its core innovation is an explicit, invertible mapping from RGB values to the text embedding space, realized via a lightweight color mapping module that, for the first time, enables reverse inference of semantically consistent text embeddings from target RGB vectors. This module decouples color representation from semantic content, enabling continuous, perceptually smooth color transitions within user-specified gamuts while preserving structural and semantic fidelity. Extensive experiments demonstrate significant improvements over state-of-the-art methods in color accuracy, interpolation smoothness, and user controllability.
📝 Abstract
In recent years, text-driven image editing has made significant progress. However, due to the inherent ambiguity and discreteness of natural language, color editing still faces challenges such as insufficient precision and difficulty in achieving continuous control. Although linearly interpolating the embedding vectors of different textual descriptions can guide the model to generate a sequence of images with varying colors, this approach lacks precise control over the range of color changes in the output images. Moreover, the relationship between the interpolation coefficient and the resulting image color is unknown and uncontrollable. To address these issues, we introduce a color mapping module that explicitly models the correspondence between the text embedding space and image RGB values. This module predicts the corresponding embedding vector based on a given RGB value, enabling precise color control of the generated images while maintaining semantic consistency. Users can specify a target RGB range to generate images with continuous color variations within the desired range, thereby achieving finer-grained, continuous, and controllable color editing. Experimental results demonstrate that our method performs well in terms of color continuity and controllability.