PerTouch: VLM-Driven Agent for Personalized and Semantic Image Retouching

📅 2025-11-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of simultaneously achieving high visual fidelity, user aesthetic alignment, and precise edit controllability in personalized image retouching. To this end, we propose the first end-to-end diffusion-based framework. Methodologically: (1) we integrate a vision-language model (VLM) to accurately parse natural-language editing instructions and model user intent; (2) we introduce semantic replacement and parameter perturbation mechanisms to enhance boundary-awareness; and (3) we design a feedback-driven rethinking module coupled with a scene-aware memory mechanism to enable long-term preference learning and multi-level instruction response. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches across multiple benchmarks. It exhibits robustness to both strong and weak instructions while supporting fine-grained control. To our knowledge, this is the first work to unify semantic editing, global aesthetic preservation, and personalized preference modeling within a single diffusion-based framework.

Technology Category

Application Category

📝 Abstract
Image retouching aims to enhance visual quality while aligning with users' personalized aesthetic preferences. To address the challenge of balancing controllability and subjectivity, we propose a unified diffusion-based image retouching framework called PerTouch. Our method supports semantic-level image retouching while maintaining global aesthetics. Using parameter maps containing attribute values in specific semantic regions as input, PerTouch constructs an explicit parameter-to-image mapping for fine-grained image retouching. To improve semantic boundary perception, we introduce semantic replacement and parameter perturbation mechanisms in the training process. To connect natural language instructions with visual control, we develop a VLM-driven agent that can handle both strong and weak user instructions. Equipped with mechanisms of feedback-driven rethinking and scene-aware memory, PerTouch better aligns with user intent and captures long-term preferences. Extensive experiments demonstrate each component's effectiveness and the superior performance of PerTouch in personalized image retouching. Code is available at: https://github.com/Auroral703/PerTouch.
Problem

Research questions and friction points this paper is trying to address.

Balancing controllability and subjectivity in personalized image retouching
Connecting natural language instructions with visual aesthetic control
Maintaining global aesthetics while enabling semantic-level image enhancement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-based framework for personalized image retouching
Semantic replacement and parameter perturbation mechanisms
VLM-driven agent with feedback rethinking and memory
🔎 Similar Papers
No similar papers found.
Z
Zewei Chang
VCIP, CS, Nankai University
Zheng-Peng Duan
Zheng-Peng Duan
Nankai University
Computer Vision
J
Jianxing Zhang
Samsung R&D Institute China - Beijing (SRC-B)
C
Chun-Le Guo
VCIP, CS, Nankai University; NKIARI, Shenzhen Futuan
S
Siyu Liu
VCIP, CS, Nankai University
H
Hyungju Chun
The Department of Camera Innovation Group, Samsung Electronics
H
Hyunhee Park
The Department of Camera Innovation Group, Samsung Electronics
Z
Zikun Liu
Samsung R&D Institute China - Beijing (SRC-B)
Chongyi Li
Chongyi Li
Professor, Nankai University
Computer VisionComputational ImagingComputational PhotographyUnderwater Imaging