Stable-Makeup: When Real-World Makeup Transfer Meets Diffusion Model

📅 2024-03-12
🏛️ arXiv.org
📈 Citations: 12
Influential: 0
📄 PDF
🤖 AI Summary
Existing makeup transfer methods struggle with the complex and diverse makeup styles encountered in real-world scenarios. This paper proposes a high-fidelity makeup transfer framework tailored for realistic applications. It introduces a novel makeup disentanglement encoder to explicitly separate facial content from structural attributes, and incorporates a makeup-guided cross-attention mechanism within a U-Net architecture to preserve identity, facial geometry, and semantic consistency with the source image. Built upon a pre-trained diffusion model, our approach integrates a detail-preserving makeup encoder, dual-control modules for content and structure, and a customized attention layer—enabling not only cross-domain makeup transfer but also makeup-conditioned text-to-image generation. Extensive experiments on multiple public benchmarks demonstrate state-of-the-art performance, with strong generalization and robustness across diverse makeup styles and lighting conditions. The code is publicly available.

Technology Category

Application Category

📝 Abstract
Current makeup transfer methods are limited to simple makeup styles, making them difficult to apply in real-world scenarios. In this paper, we introduce Stable-Makeup, a novel diffusion-based makeup transfer method capable of robustly transferring a wide range of real-world makeup, onto user-provided faces. Stable-Makeup is based on a pre-trained diffusion model and utilizes a Detail-Preserving (D-P) makeup encoder to encode makeup details. It also employs content and structural control modules to preserve the content and structural information of the source image. With the aid of our newly added makeup cross-attention layers in U-Net, we can accurately transfer the detailed makeup to the corresponding position in the source image. After content-structure decoupling training, Stable-Makeup can maintain content and the facial structure of the source image. Moreover, our method has demonstrated strong robustness and generalizability, making it applicable to varioustasks such as cross-domain makeup transfer, makeup-guided text-to-image generation and so on. Extensive experiments have demonstrated that our approach delivers state-of-the-art (SOTA) results among existing makeup transfer methods and exhibits a highly promising with broad potential applications in various related fields. Code released: https://github.com/Xiaojiu-z/Stable-Makeup
Problem

Research questions and friction points this paper is trying to address.

Transferring diverse real-world makeup styles robustly
Preserving source image content and facial structure
Enabling cross-domain makeup transfer and text-to-image generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-based makeup transfer method
Detail-Preserving makeup encoder
Content-structure decoupling training
🔎 Similar Papers
No similar papers found.
Y
Yuxuan Zhang
Shanghai Jiao Tong University
L
Lifu Wei
Peking University
Q
Qing Zhang
Shenyang Institute of Automation Chinese Academy of Sciences
Yiren Song
Yiren Song
PH.D student, National University of Singapore
Generative AIDiffusionUnified model
J
Jiaming Liu
Xiaohongshu Inc.
H
Huaxia Li
Xiaohongshu Inc.
X
Xu Tang
Xiaohongshu Inc.
Yao Hu
Yao Hu
浙江大学
Machine Learning
Haibo Zhao
Haibo Zhao
Xiaohongshu Inc.