🤖 AI Summary
Existing text-to-image generation models struggle with fine-grained color control, primarily due to insufficient understanding of the semantic encoding mechanisms within their latent spaces. This work addresses this limitation by analyzing the latent space of the FLUX.1 [Dev] variational autoencoder and, without requiring additional training, reveals an implicit hue-saturation-lightness (HSL) structure. Building upon this insight, the authors propose a closed-form latent space manipulation method that explicitly models and controls the color subspace. The approach enables precise prediction and controllable editing of colors in generated images, thereby empirically validating the existence and efficacy of a latent color subspace (LCS). The implementation has been made publicly available.
📝 Abstract
Text-to-image generation models have advanced rapidly, yet achieving fine-grained control over generated images remains difficult, largely due to limited understanding of how semantic information is encoded. We develop an interpretation of the color representation in the Variational Autoencoder latent space of FLUX.1 [Dev], revealing a structure reflecting Hue, Saturation, and Lightness. We verify our Latent Color Subspace (LCS) interpretation by demonstrating that it can both predict and explicitly control color, introducing a fully training-free method in FLUX based solely on closed-form latent-space manipulation. Code is available at https://github.com/ExplainableML/LCS.