VFace: A Training-Free Approach for Diffusion-Based Video Face Swapping

πŸ“… 2026-02-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses temporal inconsistency and identity distortion in diffusion-based video face swapping by proposing a plug-and-play, training-free approach. The method preserves source identity features through spectral attention interpolation, achieves precise facial alignment via target-structure-guided attention injection, and enhances inter-frame coherence with an optical flow–guided temporal smoothing mechanism. Notably, this is the first approach to seamlessly integrate with image-level diffusion-based face swapping models without requiring fine-tuning or additional training, significantly improving both temporal consistency and visual fidelity in video face swapping. The proposed solution is modular, practical, and readily deployable within existing pipelines.

Technology Category

Application Category

πŸ“ Abstract
We present a training-free, plug-and-play method, namely VFace, for high-quality face swapping in videos. It can be seamlessly integrated with image-based face swapping approaches built on diffusion models. First, we introduce a Frequency Spectrum Attention Interpolation technique to facilitate generation and intact key identity characteristics. Second, we achieve Target Structure Guidance via plug-and-play attention injection to better align the structural features from the target frame to the generation. Third, we present a Flow-Guided Attention Temporal Smoothening mechanism that enforces spatiotemporal coherence without modifying the underlying diffusion model to reduce temporal inconsistencies typically encountered in frame-wise generation. Our method requires no additional training or video-specific fine-tuning. Extensive experiments show that our method significantly enhances temporal consistency and visual fidelity, offering a practical and modular solution for video-based face swapping. Our code is available at https://github.com/Sanoojan/VFace.
Problem

Research questions and friction points this paper is trying to address.

video face swapping
temporal consistency
diffusion models
spatiotemporal coherence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-Free
Diffusion-Based Face Swapping
Temporal Consistency
Attention Injection
Flow-Guided Smoothing
πŸ”Ž Similar Papers
No similar papers found.