Show and Polish: Reference-Guided Identity Preservation in Face Video Restoration

📅 2025-07-14

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

To address identity feature loss and the emergence of “generic faces” in severely degraded face video restoration, this paper proposes an identity-aware diffusion model framework. Methodologically: (1) a decoupled cross-attention mechanism is designed to inject identity information using a high-quality reference image as visual prior; (2) feedback learning coupled with cosine similarity reward is introduced to suppress intra-sequence identity drift; (3) exponential-weighted inter-frame fusion and multi-stream negative prompting are adopted to mitigate temporal inconsistency and enhance facial detail generation. Evaluated on both synthetic and real-world datasets, the method achieves significant improvements over existing state-of-the-art approaches in restoration quality, identity fidelity, and long-video temporal consistency, demonstrating strong potential for practical deployment.

Technology Category

Application Category

📝 Abstract

Face Video Restoration (FVR) aims to recover high-quality face videos from degraded versions. Traditional methods struggle to preserve fine-grained, identity-specific features when degradation is severe, often producing average-looking faces that lack individual characteristics. To address these challenges, we introduce IP-FVR, a novel method that leverages a high-quality reference face image as a visual prompt to provide identity conditioning during the denoising process. IP-FVR incorporates semantically rich identity information from the reference image using decoupled cross-attention mechanisms, ensuring detailed and identity consistent results. For intra-clip identity drift (within 24 frames), we introduce an identity-preserving feedback learning method that combines cosine similarity-based reward signals with suffix-weighted temporal aggregation. This approach effectively minimizes drift within sequences of frames. For inter-clip identity drift, we develop an exponential blending strategy that aligns identities across clips by iteratively blending frames from previous clips during the denoising process. This method ensures consistent identity representation across different clips. Additionally, we enhance the restoration process with a multi-stream negative prompt, guiding the model's attention to relevant facial attributes and minimizing the generation of low-quality or incorrect features. Extensive experiments on both synthetic and real-world datasets demonstrate that IP-FVR outperforms existing methods in both quality and identity preservation, showcasing its substantial potential for practical applications in face video restoration.

Problem

Research questions and friction points this paper is trying to address.

Preserve identity-specific features in degraded face videos

Minimize intra-clip identity drift within frame sequences

Align identities across clips to ensure consistency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses reference image for identity conditioning

Employs decoupled cross-attention mechanisms

Applies identity-preserving feedback learning

🔎 Similar Papers

Face2Face: Label-driven Facial Retouching Restoration

2024-04-22arXiv.orgCitations: 0

Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model

2024-10-05arXiv.orgCitations: 3