🤖 AI Summary
To address frame-wise drift accumulation and high computational overhead in video generation, this paper proposes the first unconditional streaming video generation framework based on a dual-stream neural ordinary differential equation (ODE). Methodologically, it introduces coupled dual-path ODE dynamics: a primary path directly models temporal evolution, while an auxiliary path jointly learns residual corrections; dynamic error suppression is achieved via noise injection and a bilinear optimization objective. This design eliminates reliance on multi-step noise sampling inherent in conventional diffusion models, substantially reducing the number of ODE solver steps. Experiments demonstrate that our method generates high-fidelity videos significantly faster than prior approaches—achieving notable inference acceleration—across multiple benchmark datasets. The visual quality matches state-of-the-art conditional diffusion models, marking the first successful realization of an efficient, robust, and streaming-capable ODE-based paradigm for video generation.
📝 Abstract
We propose a novel generative video model by robustly learning temporal change as a neural Ordinary Differential Equation (ODE) flow with a bilinear objective of combining two aspects: The first is to map from the past into future video frames directly. Previous work has mapped the noise to new frames, a more computationally expensive process. Unfortunately, starting from the previous frame, instead of noise, is more prone to drifting errors. Hence, second, we additionally learn how to remove the accumulated errors as the joint objective by adding noise during training. We demonstrate unconditional video generation in a streaming manner for various video datasets, all at competitive quality compared to a baseline conditional diffusion but with higher speed, i.e., fewer ODE solver steps.