One-Forcing: Towards Stable One-Step Autoregressive Video Generation

📅 2026-05-22
📈 Citations: 0
Influential: 0
📄 PDF

career value

217K/year
🤖 AI Summary
Existing few-step autoregressive video generation methods struggle to balance efficiency and dynamic detail in single-step generation, leading to significant quality degradation. This work proposes One-Forcing, the first approach to enable stable training in single-step, frame-level autoregressive video synthesis. By integrating an auxiliary GAN loss into the Denoising Multi-step Distillation (DMD) framework, the method substantially enhances motion fidelity and visual sharpness. Requiring only approximately one-third of the training cost of prior approaches, One-Forcing achieves a state-of-the-art score of 83.76 on VBench for one-step causal video generation, matching the performance of strong multi-step baselines.
📝 Abstract
Recent advances have substantially improved real-time interactive video generation in the autoregressive regime. However, most existing few-step autoregressive video generation methods, often distilled from a corresponding many-step teacher, default to a 4-step sampling configuration, which still incurs considerable latency during deployment and suffers from severe quality degradation when the number of sampling steps is further reduced, particularly in the one-step setting. Trajectory-style consistency distillation methods often produce videos with weak dynamics, while DMD-based approaches, such as Self-Forcing, tend to yield blurry frames. To address this challenge, we propose One-Forcing, a simple yet effective approach which augments the DMD objective with an auxiliary GAN loss for high-quality and efficient one-step video generation. Experiments on VBench show that One-Forcing achieves a total score of 83.76, establishing state-of-the-art performance among one-step causal video generation methods and remaining competitive with strong many-step approaches. We further demonstrate that one-step framewise autoregressive generation can be achieved stably with merely one-third of the training cost of the chunkwise model, a setting that prior methods have failed to achieve successfully.
Problem

Research questions and friction points this paper is trying to address.

one-step video generation
autoregressive video generation
quality degradation
sampling steps
video dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

One-Forcing
one-step autoregressive video generation
consistency distillation
GAN loss
efficient video synthesis