SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training

📅 2025-06-05

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Existing diffusion-based video inpainting methods achieve high visual fidelity but incur substantial computational overhead due to multi-step iterative inference. While single-step distillation has advanced image inpainting, it faces critical challenges in high-resolution real-world video inpainting—including window-boundary artifacts, training instability, and insufficient reconstruction quality. To address these issues, we propose a single-step diffusion-style video restoration framework. Our approach introduces an adaptive window attention mechanism to mitigate temporal discontinuities at window boundaries during long-sequence modeling; designs a lightweight feature-matching loss to enhance training stability; and incorporates adversarial post-training to enable efficient, high-fidelity reconstruction. Extensive experiments demonstrate that our method achieves performance on par with or surpassing state-of-the-art multi-step diffusion models—using only a single inference step—while drastically reducing computational cost and enabling practical high-resolution video inpainting in real-world scenarios.

Technology Category

Application Category

📝 Abstract

Recent advances in diffusion-based video restoration (VR) demonstrate significant improvement in visual quality, yet yield a prohibitive computational cost during inference. While several distillation-based approaches have exhibited the potential of one-step image restoration, extending existing approaches to VR remains challenging and underexplored, particularly when dealing with high-resolution video in real-world settings. In this work, we propose a one-step diffusion-based VR model, termed as SeedVR2, which performs adversarial VR training against real data. To handle the challenging high-resolution VR within a single step, we introduce several enhancements to both model architecture and training procedures. Specifically, an adaptive window attention mechanism is proposed, where the window size is dynamically adjusted to fit the output resolutions, avoiding window inconsistency observed under high-resolution VR using window attention with a predefined window size. To stabilize and improve the adversarial post-training towards VR, we further verify the effectiveness of a series of losses, including a proposed feature matching loss without significantly sacrificing training efficiency. Extensive experiments show that SeedVR2 can achieve comparable or even better performance compared with existing VR approaches in a single step.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational cost in diffusion-based video restoration

Extending one-step restoration to high-resolution real-world videos

Improving adversarial training stability for video restoration

Innovation

Methods, ideas, or system contributions that make the work stand out.

One-step diffusion-based video restoration model

Adaptive window attention for high-resolution

Adversarial training with feature matching loss

🔎 Similar Papers

DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models

2024-07-01arXiv.orgCitations: 4

Zero-shot Video Restoration and Enhancement Using Pre-Trained Image Diffusion Model

2024-07-02arXiv.orgCitations: 0

TikTok

San Jose, California

AI Research Scientist, Computer Vision - Facebook Video Intelligence