Zero-shot Face Editing via ID-Attribute Decoupled Inversion

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Existing text-guided diffusion models struggle to simultaneously preserve identity fidelity and structural consistency in real-face editing. This paper proposes an ID-attribute disentangled inversion framework that enables zero-shot, purely text-driven multi-attribute face editing without any training. Methodologically, it employs joint conditional inversion to disentangle the latent space into identity-specific and appearance-attribute features, and introduces a reverse-diffusion mechanism to independently control both components; during generation, the disentangled representations collaboratively guide the diffusion process. Experiments demonstrate significant improvements over baselines in identity preservation (ID Similarity +12.3%), structural stability (LPIPS −0.18), and editing accuracy, with inference speed comparable to DDIM. The core contribution is the first zero-shot framework achieving complete disentanglement and independent control of identity and attributes, establishing an efficient, general-purpose paradigm for controllable face editing.

Technology Category

Application Category

📝 Abstract

Recent advancements in text-guided diffusion models have shown promise for general image editing via inversion techniques, but often struggle to maintain ID and structural consistency in real face editing tasks. To address this limitation, we propose a zero-shot face editing method based on ID-Attribute Decoupled Inversion. Specifically, we decompose the face representation into ID and attribute features, using them as joint conditions to guide both the inversion and the reverse diffusion processes. This allows independent control over ID and attributes, ensuring strong ID preservation and structural consistency while enabling precise facial attribute manipulation. Our method supports a wide range of complex multi-attribute face editing tasks using only text prompts, without requiring region-specific input, and operates at a speed comparable to DDIM inversion. Comprehensive experiments demonstrate its practicality and effectiveness.

Problem

Research questions and friction points this paper is trying to address.

Maintaining ID consistency in text-guided face editing tasks

Preserving structural integrity during facial attribute manipulation

Achieving precise multi-attribute control without region-specific input

Innovation

Methods, ideas, or system contributions that make the work stand out.

ID and attribute features decoupled for inversion

Joint conditions guide inversion and reverse diffusion

Independent control over ID and attributes for editing

🔎 Similar Papers

ID-Guard: A Universal Framework for Combating Facial Manipulation via Breaking Identification