🤖 AI Summary
Existing virtual makeup methods suffer from training instability and limited customization flexibility. To address these issues, this paper proposes a training-free latent diffusion model framework. Our approach preserves facial structure and identity features via early-stopped DDIM inversion, enabling high-fidelity, fine-grained makeup editing conditioned on multimodal inputs—including reference images, RGB color values, and textual descriptions. We introduce a novel multi-condition guidance mechanism and integrate a text encoder with a large language model interface, ensuring high editing quality under low computational overhead. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art GAN- and diffusion-based approaches in identity preservation, color matching accuracy, text alignment, and customization freedom. To the best of our knowledge, this is the first diffusion-based virtual makeup framework achieving multimodal controllability, zero-shot adaptation, and high fidelity without any training.
📝 Abstract
The exponential growth of the global makeup market has paralleled advancements in virtual makeup simulation technology. Despite the progress led by GANs, their application still encounters significant challenges, including training instability and limited customization capabilities. Addressing these challenges, we introduce DreamMakup - a novel training-free Diffusion model based Makeup Customization method, leveraging the inherent advantages of diffusion models for superior controllability and precise real-image editing. DreamMakeup employs early-stopped DDIM inversion to preserve the facial structure and identity while enabling extensive customization through various conditioning inputs such as reference images, specific RGB colors, and textual descriptions. Our model demonstrates notable improvements over existing GAN-based and recent diffusion-based frameworks - improved customization, color-matching capabilities, identity preservation and compatibility with textual descriptions or LLMs with affordable computational costs.