🤖 AI Summary
Existing image retouching methods rely on global pixel-wise color mapping, neglecting semantic content variations and user-specific style preferences—leading to color distortion, regional inconsistency, and inadequate style alignment. To address these limitations, we propose a content-adaptive curve mapping framework integrated with attribute-driven textual representation learning. Our approach introduces a novel multi-basis curve mapping mechanism that enables semantic-region-aware, context-sensitive color adjustment. We further design an attribute text prediction module to generate interpretable, fine-grained style descriptions, and construct a vision-language cross-modal fusion architecture coupled with a learnable weight map estimation module for adaptive spatial modulation. Evaluated on multiple public benchmarks, our method achieves state-of-the-art performance, significantly improving color fidelity, regional consistency, and alignment with user-defined stylistic preferences.
📝 Abstract
Image retouching has received significant attention due to its ability to achieve high-quality visual content. Existing approaches mainly rely on uniform pixel-wise color mapping across entire images, neglecting the inherent color variations induced by image content. This limitation hinders existing approaches from achieving adaptive retouching that accommodates both diverse color distributions and user-defined style preferences. To address these challenges, we propose a novel Content-Adaptive image retouching method guided by Attribute-based Text Representation (CA-ATP). Specifically, we propose a content-adaptive curve mapping module, which leverages a series of basis curves to establish multiple color mapping relationships and learns the corresponding weight maps, enabling content-aware color adjustments. The proposed module can capture color diversity within the image content, allowing similar color values to receive distinct transformations based on their spatial context. In addition, we propose an attribute text prediction module that generates text representations from multiple image attributes, which explicitly represent user-defined style preferences. These attribute-based text representations are subsequently integrated with visual features via a multimodal model, providing user-friendly guidance for image retouching. Extensive experiments on several public datasets demonstrate that our method achieves state-of-the-art performance.