ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

📅 2024-04-25
🏛️ arXiv.org
📈 Citations: 9
Influential: 1
📄 PDF
🤖 AI Summary
Existing diffusion models struggle to simultaneously preserve fine-grained facial details and ensure holistic identity consistency in single-image-driven personalized face generation. To address this, we propose a multimodal facial prompting framework for identity-preserving diffusion: (1) a fine-grained multimodal identity prompting mechanism integrating textual descriptions, spatial layouts, and semantic facial region features; (2) an identity-preserving diffusion network augmented with facial attention localization; and (3) the Fine-Grained Identity Dataset (FGID), the first large-scale benchmark comprising over 500K images annotated with pixel-level identity semantics. Evaluated on the MyStyle benchmark, our method achieves significant improvements in identity fidelity over state-of-the-art methods, while generating high-fidelity, diverse, and identity-consistent faces with real-time inference speed (<1 second per image). This work establishes a new paradigm for controllable, identity-aware facial synthesis.

Technology Category

Application Category

📝 Abstract
Diffusion-based technologies have made significant strides, particularly in personalized and customized facialgeneration. However, existing methods face challenges in achieving high-fidelity and detailed identity (ID)consistency, primarily due to insufficient fine-grained control over facial areas and the lack of a comprehensive strategy for ID preservation by fully considering intricate facial details and the overall face. To address these limitations, we introduce ConsistentID, an innovative method crafted for diverseidentity-preserving portrait generation under fine-grained multimodal facial prompts, utilizing only a single reference image. ConsistentID comprises two key components: a multimodal facial prompt generator that combines facial features, corresponding facial descriptions and the overall facial context to enhance precision in facial details, and an ID-preservation network optimized through the facial attention localization strategy, aimed at preserving ID consistency in facial regions. Together, these components significantly enhance the accuracy of ID preservation by introducing fine-grained multimodal ID information from facial regions. To facilitate training of ConsistentID, we present a fine-grained portrait dataset, FGID, with over 500,000 facial images, offering greater diversity and comprehensiveness than existing public facial datasets. % such as LAION-Face, CelebA, FFHQ, and SFHQ. Experimental results substantiate that our ConsistentID achieves exceptional precision and diversity in personalized facial generation, surpassing existing methods in the MyStyle dataset. Furthermore, while ConsistentID introduces more multimodal ID information, it maintains a fast inference speed during generation.
Problem

Research questions and friction points this paper is trying to address.

Diffusion Techniques
Facial Detail Consistency
Identity Feature Preservation
Innovation

Methods, ideas, or system contributions that make the work stand out.

ConsistentID
Identity Consistency
FGID Dataset
🔎 Similar Papers
No similar papers found.
Jiehui Huang
Jiehui Huang
Sun Yat-sen University
Machine LearningComputer VisionEmbodied AIMaterials Science
Xiao Dong
Xiao Dong
Unknown affiliation
DM CV ML
W
Wenhui Song
School of Artificial Intelligence, Shenzhen Campus, Sun Yat-Sen University, Shenzhen, P.R. China, 518107
Hanhui Li
Hanhui Li
Sun Yat-sen University
Deep LearningComputer Vision
J
Jun Zhou
School of Artificial Intelligence, Shenzhen Campus, Sun Yat-Sen University, Shenzhen, P.R. China, 518107
Y
Yuhao Cheng
Lenovo Research Group, Shenzhen, P.R. China, 518038
L
Long Chen
Lenovo Research Group, Shenzhen, P.R. China, 518038
Yiqiang Yan
Yiqiang Yan
Lenovo
S
Shengcai Liao
College of Information Technology, United Arab Emirates University, Al Ain, UAE
Xiaodan Liang
Xiaodan Liang
Professor of Computer Science, Sun Yat-sen University, MBZUAI, CMU, NUS
Computer visionEmbodied AIMachine learning