Show and Polish: Reference-Guided Identity Preservation in Face Video Restoration

πŸ“… 2025-07-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address identity feature loss and the emergence of β€œgeneric faces” in severely degraded face video restoration, this paper proposes an identity-aware diffusion model framework. Methodologically: (1) a decoupled cross-attention mechanism is designed to inject identity information using a high-quality reference image as visual prior; (2) feedback learning coupled with cosine similarity reward is introduced to suppress intra-sequence identity drift; (3) exponential-weighted inter-frame fusion and multi-stream negative prompting are adopted to mitigate temporal inconsistency and enhance facial detail generation. Evaluated on both synthetic and real-world datasets, the method achieves significant improvements over existing state-of-the-art approaches in restoration quality, identity fidelity, and long-video temporal consistency, demonstrating strong potential for practical deployment.

Technology Category

Application Category

πŸ“ Abstract
Face Video Restoration (FVR) aims to recover high-quality face videos from degraded versions. Traditional methods struggle to preserve fine-grained, identity-specific features when degradation is severe, often producing average-looking faces that lack individual characteristics. To address these challenges, we introduce IP-FVR, a novel method that leverages a high-quality reference face image as a visual prompt to provide identity conditioning during the denoising process. IP-FVR incorporates semantically rich identity information from the reference image using decoupled cross-attention mechanisms, ensuring detailed and identity consistent results. For intra-clip identity drift (within 24 frames), we introduce an identity-preserving feedback learning method that combines cosine similarity-based reward signals with suffix-weighted temporal aggregation. This approach effectively minimizes drift within sequences of frames. For inter-clip identity drift, we develop an exponential blending strategy that aligns identities across clips by iteratively blending frames from previous clips during the denoising process. This method ensures consistent identity representation across different clips. Additionally, we enhance the restoration process with a multi-stream negative prompt, guiding the model's attention to relevant facial attributes and minimizing the generation of low-quality or incorrect features. Extensive experiments on both synthetic and real-world datasets demonstrate that IP-FVR outperforms existing methods in both quality and identity preservation, showcasing its substantial potential for practical applications in face video restoration.
Problem

Research questions and friction points this paper is trying to address.

Preserve identity-specific features in degraded face videos
Minimize intra-clip identity drift within frame sequences
Align identities across clips to ensure consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses reference image for identity conditioning
Employs decoupled cross-attention mechanisms
Applies identity-preserving feedback learning