MASQUE: A Text-Guided Diffusion-Based Framework for Localized and Customized Adversarial Makeup

📅 2025-03-13

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Existing generative adversarial makeup methods heavily rely on target identity modeling, resulting in low evasion success rates, susceptibility to misuse, and frequent introduction of global artifacts or poor responsiveness to diverse text prompts. Method: We propose the first text-guided *local* adversarial makeup generation framework: (1) target-identity-free robust evasion via null-text inversion; (2) a mask-constrained customized cross-attention mechanism for precise spatial control over makeup regions; and (3) fine-grained guidance from anime-style image pairs to jointly optimize evasion efficacy, visual fidelity, and text prompt generalization. Contribution/Results: Our method significantly outperforms all baselines on both open-source diffusion models and commercial APIs, achieving substantially higher evasion success rates while preserving high perceptual quality and strong adaptability across diverse textual prompts—without requiring identity-specific supervision.

Technology Category

Application Category

📝 Abstract

As facial recognition is increasingly adopted for government and commercial services, its potential misuse has raised serious concerns about privacy and civil rights. To counteract, various anti-facial recognition techniques have been proposed for privacy protection by adversarially perturbing face images, among which generative makeup-based approaches are the most popular. However, these methods, designed primarily to impersonate specific target identities, can only achieve weak dodging success rates while increasing the risk of targeted abuse. In addition, they often introduce global visual artifacts or a lack of adaptability to accommodate diverse makeup prompts, compromising user satisfaction. To address the above limitations, we develop MASQUE, a novel diffusion-based framework that generates localized adversarial makeups guided by user-defined text prompts. Built upon precise null-text inversion, customized cross-attention fusion with masking, and a pairwise adversarial guidance mechanism using images of the same individual, MASQUE achieves robust dodging performance without requiring any external identity. Comprehensive evaluations on open-source facial recognition models and commercial APIs demonstrate that MASQUE significantly improves dodging success rates over all baselines, along with higher perceptual fidelity and stronger adaptability to various text makeup prompts.

Problem

Research questions and friction points this paper is trying to address.

Enhance privacy protection against facial recognition misuse.

Generate localized adversarial makeups using text prompts.

Improve dodging success rates and perceptual fidelity.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Text-guided diffusion for localized makeup

Null-text inversion for precise customization

Pairwise adversarial guidance for robust dodging

🔎 Similar Papers

Stable-Makeup: When Real-World Makeup Transfer Meets Diffusion Model