Autoregression-free video prediction using diffusion model for mitigating error propagation

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address error accumulation inherent in autoregressive (AR) paradigms for long-term video prediction, this paper proposes the first AR-free diffusion model framework, eliminating frame-by-frame generation and enabling end-to-end direct mapping from contextual frame tuples to future frame tuples. Our key contributions are: (1) a novel explicit motion-feature-driven motion prediction module that disentangles and models dynamic priors; and (2) a joint training strategy combining tuple-level generation with continuity regularization to ensure temporal coherence and contextual consistency. Evaluated on KTH and BAIR benchmarks, our method achieves significant improvements over state-of-the-art approaches—up to +1.2 dB in PSNR and +0.03 in SSIM for distant future frames—effectively mitigating error propagation while preserving both visual fidelity and temporal stability.

Technology Category

Application Category

📝 Abstract
Existing long-term video prediction methods often rely on an autoregressive video prediction mechanism. However, this approach suffers from error propagation, particularly in distant future frames. To address this limitation, this paper proposes the first AutoRegression-Free (ARFree) video prediction framework using diffusion models. Different from an autoregressive video prediction mechanism, ARFree directly predicts any future frame tuples from the context frame tuple. The proposed ARFree consists of two key components: 1) a motion prediction module that predicts a future motion using motion feature extracted from the context frame tuple; 2) a training method that improves motion continuity and contextual consistency between adjacent future frame tuples. Our experiments with two benchmark datasets show that the proposed ARFree video prediction framework outperforms several state-of-the-art video prediction methods.
Problem

Research questions and friction points this paper is trying to address.

Mitigates error propagation in autoregressive video prediction
Proposes AutoRegression-Free framework using diffusion models
Improves motion continuity and contextual consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Autoregression-free video prediction framework
Diffusion models for future frame prediction
Motion and contextual consistency training
🔎 Similar Papers
No similar papers found.
W
Woonho Ko
Dept. of Electrical & Computer Engn., Sungkyunkwan University (SKKU), Suwon 16419, South Korea
J
Jin Bok Park
Dept. of Electrical & Computer Engn., Sungkyunkwan University (SKKU), Suwon 16419, South Korea
Il Yong Chun
Il Yong Chun
Associate Professor of EEE, AI, ECE, ADE, SCE, DCE, & CNIR, Sungkyunkwan University
Artificial intelligenceComputer visionComputational imaging