Zero-shot Face Editing via ID-Attribute Decoupled Inversion

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text-guided diffusion models struggle to simultaneously preserve identity fidelity and structural consistency in real-face editing. This paper proposes an ID-attribute disentangled inversion framework that enables zero-shot, purely text-driven multi-attribute face editing without any training. Methodologically, it employs joint conditional inversion to disentangle the latent space into identity-specific and appearance-attribute features, and introduces a reverse-diffusion mechanism to independently control both components; during generation, the disentangled representations collaboratively guide the diffusion process. Experiments demonstrate significant improvements over baselines in identity preservation (ID Similarity +12.3%), structural stability (LPIPS −0.18), and editing accuracy, with inference speed comparable to DDIM. The core contribution is the first zero-shot framework achieving complete disentanglement and independent control of identity and attributes, establishing an efficient, general-purpose paradigm for controllable face editing.

Technology Category

Application Category

📝 Abstract
Recent advancements in text-guided diffusion models have shown promise for general image editing via inversion techniques, but often struggle to maintain ID and structural consistency in real face editing tasks. To address this limitation, we propose a zero-shot face editing method based on ID-Attribute Decoupled Inversion. Specifically, we decompose the face representation into ID and attribute features, using them as joint conditions to guide both the inversion and the reverse diffusion processes. This allows independent control over ID and attributes, ensuring strong ID preservation and structural consistency while enabling precise facial attribute manipulation. Our method supports a wide range of complex multi-attribute face editing tasks using only text prompts, without requiring region-specific input, and operates at a speed comparable to DDIM inversion. Comprehensive experiments demonstrate its practicality and effectiveness.
Problem

Research questions and friction points this paper is trying to address.

Maintaining ID consistency in text-guided face editing tasks
Preserving structural integrity during facial attribute manipulation
Achieving precise multi-attribute control without region-specific input
Innovation

Methods, ideas, or system contributions that make the work stand out.

ID and attribute features decoupled for inversion
Joint conditions guide inversion and reverse diffusion
Independent control over ID and attributes for editing
🔎 Similar Papers
No similar papers found.
Yang Hou
Yang Hou
Soochow University
M
Minggu Wang
Graduate School and Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan
Jianjun Zhao
Jianjun Zhao
Kyushu University
Software EngineeringProgramming Languages