Face-MakeUp: Multimodal Facial Prompts for Text-to-Image Generation

📅 2025-01-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text-to-face generation methods suffer from weak textual representation capabilities, limiting precise control over facial details (e.g., makeup) and identity consistency. To address this, we propose a diffusion model enhancement framework that integrates multimodal features from reference faces. Specifically, we construct FaceCaptionHQ-4M—a high-quality dataset of 4 million face-caption pairs—and introduce the first multi-scale facial content and pose joint encoding mechanism, explicitly modeling identity, geometric structure, and semantic attributes. This encoder is embedded into the UNet’s conditional control path to enable fine-grained, reference-guided generation. Our method achieves state-of-the-art performance on two major face generation benchmarks, significantly outperforming purely text-driven and generic image-guided approaches—particularly in makeup fidelity and identity preservation. Both code and the FaceCaptionHQ-4M dataset are publicly released.

Technology Category

Application Category

📝 Abstract
Facial images have extensive practical applications. Although the current large-scale text-image diffusion models exhibit strong generation capabilities, it is challenging to generate the desired facial images using only text prompt. Image prompts are a logical choice. However, current methods of this type generally focus on general domain. In this paper, we aim to optimize image makeup techniques to generate the desired facial images. Specifically, (1) we built a dataset of 4 million high-quality face image-text pairs (FaceCaptionHQ-4M) based on LAION-Face to train our Face-MakeUp model; (2) to maintain consistency with the reference facial image, we extract/learn multi-scale content features and pose features for the facial image, integrating these into the diffusion model to enhance the preservation of facial identity features for diffusion models. Validation on two face-related test datasets demonstrates that our Face-MakeUp can achieve the best comprehensive performance.All codes are available at:https://github.com/ddw2AIGROUP2CQUPT/Face-MakeUp
Problem

Research questions and friction points this paper is trying to address.

Large-scale Models
Facial Image Generation
Makeup Effects
Innovation

Methods, ideas, or system contributions that make the work stand out.

FaceCaptionHQ-4M
Diverse Facial Features Incorporation
Textual Description-based Facial Image Generation
🔎 Similar Papers
No similar papers found.
Dawei Dai
Dawei Dai
Chongqing University of Posts and Telecommunications
Deep Learning
M
Mingming Jia
Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, 400065, Chongqing, China
Y
Yinxiu Zhou
Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, 400065, Chongqing, China
H
Hang Xing
Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, 400065, Chongqing, China
C
Chenghang Li
Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, 400065, Chongqing, China