AceTone: Bridging Words and Colors for Conditional Image Grading

📅 2026-04-01

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Existing image color grading methods struggle to accommodate diverse creative intents and often exhibit insufficient alignment with human aesthetic preferences. This work proposes a multimodal conditional generative framework for color transformation that, for the first time, unifies text and reference image guidance within a single architecture to stylize images via 3D lookup tables (3D-LUTs). The core innovations include compressing a 3×32³ LUT into 64 discrete tokens using a vector-quantized variational autoencoder (VQ-VAE) and enhancing perceptual aesthetic alignment through a vision-language model coupled with reinforcement learning. Experiments demonstrate that the proposed method achieves state-of-the-art performance in both text- and reference-guided color grading, improving the LPIPS metric by up to 50% and significantly outperforming existing approaches in human evaluations of visual appeal and style consistency.

Technology Category

Application Category

📝 Abstract

Color affects how we interpret image style and emotion. Previous color grading methods rely on patch-wise recoloring or fixed filter banks, struggling to generalize across creative intents or align with human aesthetic preferences. In this study, we propose AceTone, the first approach that supports multimodal conditioned color grading within a unified framework. AceTone formulates grading as a generative color transformation task, where a model directly produces 3D-LUTs conditioned on text prompts or reference images. We develop a VQ-VAE based tokenizer which compresses a $3\times32^3$ LUT vector to 64 discrete tokens with $ΔE<2$ fidelity. We further build a large-scale dataset, AceTone-800K, and train a vision-language model to predict LUT tokens, followed by reinforcement learning to align outputs with perceptual fidelity and aesthetics. Experiments show that AceTone achieves state-of-the-art performance on both text-guided and reference-guided grading tasks, improving LPIPS by up to 50% over existing methods. Human evaluations confirm that AceTone's results are visually pleasing and stylistically coherent, demonstrating a new pathway toward language-driven, aesthetic-aligned color grading.

Problem

Research questions and friction points this paper is trying to address.

color grading

creative intent

aesthetic preference

generalization

image style

Innovation

Methods, ideas, or system contributions that make the work stand out.

conditional color grading

3D-LUT generation

VQ-VAE tokenizer