DiffuseSlide: Training-Free High Frame Rate Video Generation Diffusion

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

To address flickering artifacts, quality degradation in long sequences, and computational inefficiency when generating high-frame-rate videos with pre-trained diffusion models, this paper proposes a training-free frame interpolation framework. Methodologically, it introduces a novel noise re-injection mechanism and a sliding-window implicit denoising strategy, integrated with keyframe guidance and latent-space iterative optimization to achieve arbitrary high-FPS synthesis while preserving spatiotemporal consistency. The core contribution lies in temporally aware noise modulation and local-global collaborative optimization—performed without modifying pre-trained model weights—significantly mitigating structural distortions and luminance flickering in fast-motion scenes. Experiments demonstrate superior performance over state-of-the-art zero-shot video interpolation and generation methods in PSNR, LPIPS, and user studies. The approach achieves high visual quality, computational efficiency, and strong generalization, making it suitable for applications demanding stringent temporal fidelity, such as VR and real-time rendering.

Technology Category

Application Category

📝 Abstract

Recent advancements in diffusion models have revolutionized video generation, enabling the creation of high-quality, temporally consistent videos. However, generating high frame-rate (FPS) videos remains a significant challenge due to issues such as flickering and degradation in long sequences, particularly in fast-motion scenarios. Existing methods often suffer from computational inefficiencies and limitations in maintaining video quality over extended frames. In this paper, we present a novel, training-free approach for high FPS video generation using pre-trained diffusion models. Our method, DiffuseSlide, introduces a new pipeline that leverages key frames from low FPS videos and applies innovative techniques, including noise re-injection and sliding window latent denoising, to achieve smooth, consistent video outputs without the need for additional fine-tuning. Through extensive experiments, we demonstrate that our approach significantly improves video quality, offering enhanced temporal coherence and spatial fidelity. The proposed method is not only computationally efficient but also adaptable to various video generation tasks, making it ideal for applications such as virtual reality, video games, and high-quality content creation.

Problem

Research questions and friction points this paper is trying to address.

Generating high frame-rate videos without flickering or degradation

Overcoming computational inefficiencies in long video sequences

Maintaining video quality in fast-motion scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free high FPS video generation

Noise re-injection and sliding window denoising

Leverages pre-trained diffusion models

🔎 Similar Papers

Pyramidal Flow Matching for Efficient Video Generative Modeling