Sequence-Adaptive Video Prediction in Continuous Streams using Diffusion Noise Optimization

📅 2025-11-22

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

To address future frame prediction in continuous video streams, this paper proposes a sequence-adaptive inference method that operates without fine-tuning model parameters. Specifically, given a fixed pre-trained diffusion model, the approach dynamically optimizes the noise latent variables during the diffusion sampling process to achieve real-time adaptation to incoming video sequences. This avoids the computational overhead and catastrophic forgetting associated with conventional parameter fine-tuning, thereby significantly improving long-horizon prediction stability and fidelity. To rigorously evaluate continual learning capability, we introduce a dedicated video-stream-oriented continual evaluation protocol. Extensive experiments on four benchmarks—Ego4D, OpenDV-YouTube, UCF-101, and SkyTimelapse—demonstrate consistent and substantial improvements in FVD, SSIM, and PSNR, validating the method’s effectiveness, efficiency, and cross-domain generalization capacity.

Technology Category

Application Category

📝 Abstract

In this work, we investigate diffusion-based video prediction models, which forecast future video frames, for continuous video streams. In this context, the models observe continuously new training samples, and we aim to leverage this to improve their predictions. We thus propose an approach that continuously adapts a pre-trained diffusion model to a video stream. Since fine-tuning the parameters of a large diffusion model is too expensive, we refine the diffusion noise during inference while keeping the model parameters frozen, allowing the model to adaptively determine suitable sampling noise. We term the approach Sequence Adaptive Video Prediction with Diffusion Noise Optimization (SAVi-DNO). To validate our approach, we introduce a new evaluation setting on the Ego4D dataset, focusing on simultaneous adaptation and evaluation on long continuous videos. Empirical results demonstrate improved performance based on FVD, SSIM, and PSNR metrics on long videos of Ego4D and OpenDV-YouTube, as well as videos of UCF-101 and SkyTimelapse, showcasing SAVi-DNO's effectiveness.

Problem

Research questions and friction points this paper is trying to address.

Adapting pre-trained diffusion models to continuous video streams

Optimizing diffusion noise during inference without model fine-tuning

Improving video prediction metrics on long continuous sequences

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes diffusion noise during inference

Adapts pre-trained model to video streams

Keeps model parameters frozen for efficiency

🔎 Similar Papers

Pyramidal Flow Matching for Efficient Video Generative Modeling