BeautyGRPO: Aesthetic Alignment for Face Retouching via Dynamic Path Guidance and Fine-Grained Preference Modeling

📅 2026-03-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that existing image retouching methods struggle to model subjective aesthetic preferences. While online reinforcement learning can align outputs with human preferences, its reliance on random exploration often introduces noise that compromises high-fidelity results. To mitigate this, the authors propose a Dynamic Path Guidance (DPG) mechanism that dynamically replans sampling trajectories via anchor-based ordinary differential equation (ODE) paths, effectively suppressing random drift while preserving exploratory capability. Additionally, they introduce FRPref-10K, the first fine-grained facial retouching preference dataset spanning five retouching dimensions, along with a dedicated reward model. Experiments demonstrate that the proposed approach outperforms both specialized and general-purpose retouching models in texture quality, blemish removal accuracy, and overall aesthetic alignment, achieving significantly higher consistency with human preferences.

Technology Category

Application Category

📝 Abstract
Face retouching requires removing subtle imperfections while preserving unique facial identity features, in order to enhance overall aesthetic appeal. However, existing methods suffer from a fundamental trade-off. Supervised learning on labeled data is constrained to pixel-level label mimicry, failing to capture complex subjective human aesthetic preferences. Conversely, while online reinforcement learning (RL) excels at preference alignment, its stochastic exploration paradigm conflicts with the high-fidelity demands of face retouching and often introduces noticeable noise artifacts due to accumulated stochastic drift. To address these limitations, we propose BeautyGRPO, a reinforcement learning framework that aligns face retouching with human aesthetic preferences. We construct FRPref-10K, a fine-grained preference dataset covering five key retouching dimensions, and train a specialized reward model capable of evaluating subtle perceptual differences. To reconcile exploration and fidelity, we introduce Dynamic Path Guidance (DPG). DPG stabilizes the stochastic sampling trajectory by dynamically computing an anchor-based ODE path and replanning a guided trajectory at each sampling timestep, effectively correcting stochastic drift while maintaining controlled exploration. Extensive experiments show that BeautyGRPO outperforms both specialized face retouching methods and general image editing models, achieving superior texture quality, more accurate blemish removal, and overall results that better align with human aesthetic preferences.
Problem

Research questions and friction points this paper is trying to address.

face retouching
aesthetic preference alignment
stochastic drift
high-fidelity image editing
subjective human aesthetics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Path Guidance
Fine-Grained Preference Modeling
Aesthetic Alignment
Reinforcement Learning for Image Editing
Face Retouching
🔎 Similar Papers
No similar papers found.
J
Jiachen Yang
School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University, China
Xianhui Lin
Xianhui Lin
Tongyi Lab, Alibaba Group
Computer VisionLow-level VisionVideo Generation
Y
Yi Dong
vivo BlueImage Lab, vivo Mobile Communication Co., Ltd., China
Z
Zebiao Zheng
vivo BlueImage Lab, vivo Mobile Communication Co., Ltd., China
X
Xing Liu
vivo BlueImage Lab, vivo Mobile Communication Co., Ltd., China
Hong Gu
Hong Gu
National Institute on Drug Abuse, NIH
functional MRIfunctional connectivitydrug addiction
Y
Yanmei Fang
School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University, China; Guangdong Provincial Key Laboratory of Information Security Technology, China