SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration

📅 2025-01-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of restoring arbitrary-length, arbitrary-resolution videos under unknown degradations in real-world scenarios—where existing methods struggle to balance long-term temporal fidelity and spatiotemporal consistency while suffering from low inference efficiency—this paper proposes VideoFixer, a general-purpose video restoration framework based on diffusion Transformers. Key contributions include: (1) a novel shifted window attention mechanism enabling adaptive, variable-size spatiotemporal window modeling at boundaries; (2) a causal video autoencoder for efficient spatiotemporal feature compression; and (3) a hybrid image/video pretraining strategy coupled with progressive diffusion fine-tuning. VideoFixer achieves state-of-the-art performance across synthetic, real-world, and AI-generated video benchmarks, significantly improving long-sequence reconstruction quality (−12.6% LPIPS) and sampling efficiency (2.3× speedup), establishing a new paradigm for general video restoration.

Technology Category

Application Category

📝 Abstract
Video restoration poses non-trivial challenges in maintaining fidelity while recovering temporally consistent details from unknown degradations in the wild. Despite recent advances in diffusion-based restoration, these methods often face limitations in generation capability and sampling efficiency. In this work, we present SeedVR, a diffusion transformer designed to handle real-world video restoration with arbitrary length and resolution. The core design of SeedVR lies in the shifted window attention that facilitates effective restoration on long video sequences. SeedVR further supports variable-sized windows near the boundary of both spatial and temporal dimensions, overcoming the resolution constraints of traditional window attention. Equipped with contemporary practices, including causal video autoencoder, mixed image and video training, and progressive training, SeedVR achieves highly-competitive performance on both synthetic and real-world benchmarks, as well as AI-generated videos. Extensive experiments demonstrate SeedVR's superiority over existing methods for generic video restoration.
Problem

Research questions and friction points this paper is trying to address.

Video Restoration
Complex Videos
Long Videos
Innovation

Methods, ideas, or system contributions that make the work stand out.

SeedVR
Video Restoration
AI-generated Videos