🤖 AI Summary
This work addresses the challenge that existing image retouching methods struggle to model subjective aesthetic preferences. While online reinforcement learning can align outputs with human preferences, its reliance on random exploration often introduces noise that compromises high-fidelity results. To mitigate this, the authors propose a Dynamic Path Guidance (DPG) mechanism that dynamically replans sampling trajectories via anchor-based ordinary differential equation (ODE) paths, effectively suppressing random drift while preserving exploratory capability. Additionally, they introduce FRPref-10K, the first fine-grained facial retouching preference dataset spanning five retouching dimensions, along with a dedicated reward model. Experiments demonstrate that the proposed approach outperforms both specialized and general-purpose retouching models in texture quality, blemish removal accuracy, and overall aesthetic alignment, achieving significantly higher consistency with human preferences.
📝 Abstract
Face retouching requires removing subtle imperfections while preserving unique facial identity features, in order to enhance overall aesthetic appeal. However, existing methods suffer from a fundamental trade-off. Supervised learning on labeled data is constrained to pixel-level label mimicry, failing to capture complex subjective human aesthetic preferences. Conversely, while online reinforcement learning (RL) excels at preference alignment, its stochastic exploration paradigm conflicts with the high-fidelity demands of face retouching and often introduces noticeable noise artifacts due to accumulated stochastic drift. To address these limitations, we propose BeautyGRPO, a reinforcement learning framework that aligns face retouching with human aesthetic preferences. We construct FRPref-10K, a fine-grained preference dataset covering five key retouching dimensions, and train a specialized reward model capable of evaluating subtle perceptual differences. To reconcile exploration and fidelity, we introduce Dynamic Path Guidance (DPG). DPG stabilizes the stochastic sampling trajectory by dynamically computing an anchor-based ODE path and replanning a guided trajectory at each sampling timestep, effectively correcting stochastic drift while maintaining controlled exploration. Extensive experiments show that BeautyGRPO outperforms both specialized face retouching methods and general image editing models, achieving superior texture quality, more accurate blemish removal, and overall results that better align with human aesthetic preferences.