RetouchLLM: Training-free White-box Image Retouching

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Existing learning-based image retouching methods rely on large-scale paired training data and operate as opaque black-box models, lacking interpretability and user controllability. To address these limitations, we propose the first training-free, white-box, code-driven retouching framework. Our method employs a visual discrepancy assessment module to identify semantic gaps between input and reference images; it then performs multi-step reasoning augmented with natural language understanding to generate executable, interpretable Python retouching instructions. Mimicking human multi-stage editing behavior, the framework supports high-resolution image processing, cross-style generalization, and natural language–guided interactive adjustments. Experiments demonstrate that our approach achieves high-fidelity retouching results across diverse tasks, accurately realizes user intent, and enables fine-grained personalized control—thereby significantly enhancing transparency, adaptability, and user agency in the retouching pipeline.

Technology Category

Application Category

📝 Abstract

Image retouching not only enhances visual quality but also serves as a means of expressing personal preferences and emotions. However, existing learning-based approaches require large-scale paired data and operate as black boxes, making the retouching process opaque and limiting their adaptability to handle diverse, user- or image-specific adjustments. In this work, we propose RetouchLLM, a training-free white-box image retouching system, which requires no training data and performs interpretable, code-based retouching directly on high-resolution images. Our framework progressively enhances the image in a manner similar to how humans perform multi-step retouching, allowing exploration of diverse adjustment paths. It comprises of two main modules: a visual critic that identifies differences between the input and reference images, and a code generator that produces executable codes. Experiments demonstrate that our approach generalizes well across diverse retouching styles, while natural language-based user interaction enables interpretable and controllable adjustments tailored to user intent.

Problem

Research questions and friction points this paper is trying to address.

Eliminates need for large-scale paired training data

Provides interpretable white-box image retouching process

Enables customizable adjustments based on user preferences

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free white-box system for image retouching

Visual critic identifies input-reference image differences

Code generator produces executable retouching adjustments

🔎 Similar Papers

Face2Face: Label-driven Facial Retouching Restoration