VeraRetouch: A Lightweight Fully Differentiable Framework for Multi-Task Reasoning Photo Retouching

📅 2026-04-29

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work addresses the limitations of existing photo retouching methods, which rely on non-differentiable external tools, leading to optimization difficulties, parameter redundancy, and poor generalization. To overcome these issues, we propose a lightweight, end-to-end differentiable retouching framework that leverages a 0.5B-parameter vision-language model to interpret both image defects and semantic editing instructions. A fully differentiable Retouch Renderer enables pixel-level training, while decoupled control latent variables and inverse degradation-based data synthesis enhance model generalization. Our contributions include AetherRetouch-1M+, the first million-scale professional retouching dataset, the differentiable renderer itself, and DAPO-AE—a reinforcement learning–based post-training strategy. The proposed method achieves state-of-the-art performance across multiple benchmarks, with significantly reduced model size, enabling efficient multi-task inference and mobile deployment.

📝 Abstract

Reasoning photo retouching has gained significant traction, requiring models to analyze image defects, give reasoning processes, and execute precise retouching enhancements. However, existing approaches often rely on non-differentiable external software, creating optimization barriers and suffering from high parameter redundancy and limited generalization. To address these challenges, we propose VeraRetouch, a lightweight and fully differentiable framework for multi-task photo retouching. We employ a 0.5B Vision-Language Model (VLM) as the central intelligence to formulate retouching plans based on instructions and scene semantics. Furthermore, we develop a fully differentiable Retouch Renderer that replaces external tools, enabling direct end-to-end pixel-level training through decoupled control latents for lighting, global color, and specific color adjustments. To overcome data scarcity, we introduce AetherRetouch-1M+, the first million-scale dataset for professional retouching, constructed via a new inverse degradation workflow. Furthermore, we propose DAPO-AE, a reinforcement learning post-training strategy that enhances autonomous aesthetic cognition. Extensive experiments demonstrate that VeraRetouch achieves state-of-the-art performance across multiple benchmarks while maintaining a significantly smaller footprint, enabling mobile deployment. Our code and models are publicly available at https://github.com/OpenVeraTeam/VeraRetouch.

Problem

Research questions and friction points this paper is trying to address.

photo retouching

differentiable framework

multi-task reasoning

optimization barriers

generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

fully differentiable rendering

vision-language model

multi-task photo retouching