HiFi-Portrait: Zero-shot Identity-preserved Portrait Generation with High-fidelity Multi-face Fusion

πŸ“… 2025-12-16
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing diffusion models for multi-reference image-guided identity-preserving portrait generation suffer from low ID fidelity and coarse-grained facial attribute control. To address these limitations, we propose ZeroIDβ€”the first zero-shot identity-preserving high-fidelity portrait generation framework. ZeroID jointly models identity and geometric priors via a face discriminator and a 3D-aware keypoint generator; introduces HiFi-Net to enable cross-scale alignment and fusion of multi-reference facial features; and constructs the first large-scale, automatically ID-annotated dataset alongside an SDXL-compatible training pipeline. Extensive experiments demonstrate that ZeroID significantly outperforms state-of-the-art methods in ID similarity (+12.6%) and attribute controllability (+28.4%). Moreover, ZeroID supports plug-and-play adaptation to mainstream portrait generation systems without architectural modification or fine-tuning.

Technology Category

Application Category

πŸ“ Abstract
Recent advancements in diffusion-based technologies have made significant strides, particularly in identity-preserved portrait generation (IPG). However, when using multiple reference images from the same ID, existing methods typically produce lower-fidelity portraits and struggle to customize face attributes precisely. To address these issues, this paper presents HiFi-Portrait, a high-fidelity method for zero-shot portrait generation. Specifically, we first introduce the face refiner and landmark generator to obtain fine-grained multi-face features and 3D-aware face landmarks. The landmarks include the reference ID and the target attributes. Then, we design HiFi-Net to fuse multi-face features and align them with landmarks, which improves ID fidelity and face control. In addition, we devise an automated pipeline to construct an ID-based dataset for training HiFi-Portrait. Extensive experimental results demonstrate that our method surpasses the SOTA approaches in face similarity and controllability. Furthermore, our method is also compatible with previous SDXL-based works.
Problem

Research questions and friction points this paper is trying to address.

Generates high-fidelity portraits from multiple reference images
Preserves identity while precisely customizing face attributes
Fuses multi-face features and aligns with 3D-aware landmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Face refiner extracts fine-grained multi-face features for generation.
HiFi-Net fuses features with 3D landmarks to enhance fidelity.
Automated pipeline builds ID dataset for training the model.
πŸ”Ž Similar Papers
No similar papers found.
Y
Yifang Xu
Nanjing University
B
Benxiang Zhai
Nanjing University
Y
Yunzhuo Sun
Dalian University of Technology
M
Ming Li
Nanjing University of Information Science and Technology
Y
Yang Li
Nanjing University
Sidan Du
Sidan Du
Nanjing University
Image Processing and ControlMachine Learning