🤖 AI Summary
Existing makeup transfer methods struggle with the complex and diverse makeup styles encountered in real-world scenarios. This paper proposes a high-fidelity makeup transfer framework tailored for realistic applications. It introduces a novel makeup disentanglement encoder to explicitly separate facial content from structural attributes, and incorporates a makeup-guided cross-attention mechanism within a U-Net architecture to preserve identity, facial geometry, and semantic consistency with the source image. Built upon a pre-trained diffusion model, our approach integrates a detail-preserving makeup encoder, dual-control modules for content and structure, and a customized attention layer—enabling not only cross-domain makeup transfer but also makeup-conditioned text-to-image generation. Extensive experiments on multiple public benchmarks demonstrate state-of-the-art performance, with strong generalization and robustness across diverse makeup styles and lighting conditions. The code is publicly available.
📝 Abstract
Current makeup transfer methods are limited to simple makeup styles, making them difficult to apply in real-world scenarios. In this paper, we introduce Stable-Makeup, a novel diffusion-based makeup transfer method capable of robustly transferring a wide range of real-world makeup, onto user-provided faces. Stable-Makeup is based on a pre-trained diffusion model and utilizes a Detail-Preserving (D-P) makeup encoder to encode makeup details. It also employs content and structural control modules to preserve the content and structural information of the source image. With the aid of our newly added makeup cross-attention layers in U-Net, we can accurately transfer the detailed makeup to the corresponding position in the source image. After content-structure decoupling training, Stable-Makeup can maintain content and the facial structure of the source image. Moreover, our method has demonstrated strong robustness and generalizability, making it applicable to varioustasks such as cross-domain makeup transfer, makeup-guided text-to-image generation and so on. Extensive experiments have demonstrated that our approach delivers state-of-the-art (SOTA) results among existing makeup transfer methods and exhibits a highly promising with broad potential applications in various related fields. Code released: https://github.com/Xiaojiu-z/Stable-Makeup