Autoregression-free video prediction using diffusion model for mitigating error propagation

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

To address error accumulation inherent in autoregressive (AR) paradigms for long-term video prediction, this paper proposes the first AR-free diffusion model framework, eliminating frame-by-frame generation and enabling end-to-end direct mapping from contextual frame tuples to future frame tuples. Our key contributions are: (1) a novel explicit motion-feature-driven motion prediction module that disentangles and models dynamic priors; and (2) a joint training strategy combining tuple-level generation with continuity regularization to ensure temporal coherence and contextual consistency. Evaluated on KTH and BAIR benchmarks, our method achieves significant improvements over state-of-the-art approaches—up to +1.2 dB in PSNR and +0.03 in SSIM for distant future frames—effectively mitigating error propagation while preserving both visual fidelity and temporal stability.

Technology Category

Application Category

📝 Abstract

Existing long-term video prediction methods often rely on an autoregressive video prediction mechanism. However, this approach suffers from error propagation, particularly in distant future frames. To address this limitation, this paper proposes the first AutoRegression-Free (ARFree) video prediction framework using diffusion models. Different from an autoregressive video prediction mechanism, ARFree directly predicts any future frame tuples from the context frame tuple. The proposed ARFree consists of two key components: 1) a motion prediction module that predicts a future motion using motion feature extracted from the context frame tuple; 2) a training method that improves motion continuity and contextual consistency between adjacent future frame tuples. Our experiments with two benchmark datasets show that the proposed ARFree video prediction framework outperforms several state-of-the-art video prediction methods.

Problem

Research questions and friction points this paper is trying to address.

Mitigates error propagation in autoregressive video prediction

Proposes AutoRegression-Free framework using diffusion models

Improves motion continuity and contextual consistency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Autoregression-free video prediction framework

Diffusion models for future frame prediction

Motion and contextual consistency training

🔎 Similar Papers

Diff-IP2D: Diffusion-Based Hand-Object Interaction Prediction on Egocentric Videos