DiffuseSlide: Training-Free High Frame Rate Video Generation Diffusion

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address flickering artifacts, quality degradation in long sequences, and computational inefficiency when generating high-frame-rate videos with pre-trained diffusion models, this paper proposes a training-free frame interpolation framework. Methodologically, it introduces a novel noise re-injection mechanism and a sliding-window implicit denoising strategy, integrated with keyframe guidance and latent-space iterative optimization to achieve arbitrary high-FPS synthesis while preserving spatiotemporal consistency. The core contribution lies in temporally aware noise modulation and local-global collaborative optimization—performed without modifying pre-trained model weights—significantly mitigating structural distortions and luminance flickering in fast-motion scenes. Experiments demonstrate superior performance over state-of-the-art zero-shot video interpolation and generation methods in PSNR, LPIPS, and user studies. The approach achieves high visual quality, computational efficiency, and strong generalization, making it suitable for applications demanding stringent temporal fidelity, such as VR and real-time rendering.

Technology Category

Application Category

📝 Abstract
Recent advancements in diffusion models have revolutionized video generation, enabling the creation of high-quality, temporally consistent videos. However, generating high frame-rate (FPS) videos remains a significant challenge due to issues such as flickering and degradation in long sequences, particularly in fast-motion scenarios. Existing methods often suffer from computational inefficiencies and limitations in maintaining video quality over extended frames. In this paper, we present a novel, training-free approach for high FPS video generation using pre-trained diffusion models. Our method, DiffuseSlide, introduces a new pipeline that leverages key frames from low FPS videos and applies innovative techniques, including noise re-injection and sliding window latent denoising, to achieve smooth, consistent video outputs without the need for additional fine-tuning. Through extensive experiments, we demonstrate that our approach significantly improves video quality, offering enhanced temporal coherence and spatial fidelity. The proposed method is not only computationally efficient but also adaptable to various video generation tasks, making it ideal for applications such as virtual reality, video games, and high-quality content creation.
Problem

Research questions and friction points this paper is trying to address.

Generating high frame-rate videos without flickering or degradation
Overcoming computational inefficiencies in long video sequences
Maintaining video quality in fast-motion scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free high FPS video generation
Noise re-injection and sliding window denoising
Leverages pre-trained diffusion models
🔎 Similar Papers
G
Geunmin Hwang
RECON Labs Inc., Department of Artificial Intelligence, Sungkyunkwan University
Hyun-kyu Ko
Hyun-kyu Ko
Sungkyunkwan University
Computer Vision
Y
Younghyun Kim
Department of Artificial Intelligence, Yonsei University
S
Seungryong Lee
Department of Electrical and Computer Engineering, Sungkyunkwan University
Eunbyung Park
Eunbyung Park
Yonsei University
Computer VisionMachine LearningDeep Learning