FaceSnap: Enhanced ID-Fidelity Network for Tuning-Free Portrait Customization

📅 2026-01-31
🏛️ International Conference on Artificial Neural Networks
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a plug-and-play framework based on Stable Diffusion for high-fidelity personalized portrait generation from a single reference image, eliminating the need for time-consuming fine-tuning. Existing methods often suffer from limited generalization, poor identity preservation, or insufficient detail fidelity. To address these challenges, the proposed approach introduces three key components: a facial attribute mixer that fuses multi-level features, a landmark predictor to maintain identity across diverse poses, and an identity-preserving module embedded within the UNet architecture to enhance both detail quality and generation diversity. Experimental results demonstrate that the method significantly outperforms state-of-the-art approaches in terms of identity consistency and photorealistic detail preservation, achieving high-quality results in a single inference pass without model adaptation.

Technology Category

Application Category

📝 Abstract
Benefiting from the significant advancements in text-to-image diffusion models, research in personalized image generation, particularly customized portrait generation, has also made great strides recently. However, existing methods either require time-consuming fine-tuning and lack generalizability or fail to achieve high fidelity in facial details. To address these issues, we propose FaceSnap, a novel method based on Stable Diffusion (SD) that requires only a single reference image and produces extremely consistent results in a single inference stage. This method is plug-and-play and can be easily extended to different SD models. Specifically, we design a new Facial Attribute Mixer that can extract comprehensive fused information from both low-level specific features and high-level abstract features, providing better guidance for image generation. We also introduce a Landmark Predictor that maintains reference identity across landmarks with different poses, providing diverse yet detailed spatial control conditions for image generation. Then we use an ID-preserving module to inject these into the UNet. Experimental results demonstrate that our approach performs remarkably in personalized and customized portrait generation, surpassing other state-of-the-art methods in this domain.
Problem

Research questions and friction points this paper is trying to address.

personalized image generation
portrait customization
ID-fidelity
facial detail fidelity
generalizability
Innovation

Methods, ideas, or system contributions that make the work stand out.

FaceSnap
ID-fidelity
tuning-free
Facial Attribute Mixer
Landmark Predictor
B
Benxiang Zhai
Vision AI System Lab, Nanjing University, Nanjing, China
Y
Yifang Xu
Vision AI System Lab, Nanjing University, Nanjing, China
G
Guofeng Zhang
Research and Development Department, Wonxing Technology, Shanghai, China
Y
Yang Li
Vision AI System Lab, Nanjing University, Nanjing, China
Sidan Du
Sidan Du
Nanjing University
Image Processing and ControlMachine Learning