🤖 AI Summary
This paper addresses two key challenges in diffusion-based face swapping: source identity loss and identity-attribute condition conflict. To tackle these, we propose an identity-constrained attribute tuning framework. Methodologically: (1) we design a disentangled conditional injection mechanism that separately encodes identity features from pose, expression, and other attributes; (2) we introduce a progressive attribute alignment strategy to mitigate conditional competition during generation; and (3) we incorporate an identity-aware loss and adversarial optimization module in the post-training stage to enhance identity fidelity. Extensive quantitative and qualitative evaluations across multiple benchmarks demonstrate state-of-the-art performance: identity similarity (ID-Sim) improves by 12.6%, and attribute consistency (measured by LPIPS) increases by 23.4% (i.e., LPIPS decreases by 23.4%), significantly outperforming existing diffusion-based baselines.
📝 Abstract
Face swapping aims to seamlessly transfer a source facial identity onto a target while preserving target attributes such as pose and expression. Diffusion models, known for their superior generative capabilities, have recently shown promise in advancing face-swapping quality. This paper addresses two key challenges in diffusion-based face swapping: the prioritized preservation of identity over target attributes and the inherent conflict between identity and attribute conditioning. To tackle these issues, we introduce an identity-constrained attribute-tuning framework for face swapping that first ensures identity preservation and then fine-tunes for attribute alignment, achieved through a decoupled condition injection. We further enhance fidelity by incorporating identity and adversarial losses in a post-training refinement stage. Our proposed identity-constrained diffusion-based face-swapping model outperforms existing methods in both qualitative and quantitative evaluations, demonstrating superior identity similarity and attribute consistency, achieving a new state-of-the-art performance in high-fidelity face swapping.