Reward-Forcing: Autoregressive Video Generation with Reward Feedback

📅 2026-01-23

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work proposes a teacher-free, reward-guided autoregressive framework for video generation, addressing the limitations of existing autoregressive methods that rely on high-quality teacher models—constraints that hinder both performance and scalability, particularly when such teachers are unavailable, leading to inferior generation quality compared to bidirectional models. By incorporating reward signals from reinforcement learning, the proposed approach optimizes the generation process while maintaining visual fidelity and temporal consistency, significantly simplifying the training pipeline. Evaluated on the VBench benchmark, the method achieves a score of 84.92, outperforming comparable autoregressive approaches that depend on complex heterogeneous distillation (84.31) and approaching state-of-the-art performance, thereby demonstrating its effectiveness and scalability.

Technology Category

Application Category

📝 Abstract

While most prior work in video generation relies on bidirectional architectures, recent efforts have sought to adapt these models into autoregressive variants to support near real-time generation. However, such adaptations often depend heavily on teacher models, which can limit performance, particularly in the absence of a strong autoregressive teacher, resulting in output quality that typically lags behind their bidirectional counterparts. In this paper, we explore an alternative approach that uses reward signals to guide the generation process, enabling more efficient and scalable autoregressive generation. By using reward signals to guide the model, our method simplifies training while preserving high visual fidelity and temporal consistency. Through extensive experiments on standard benchmarks, we find that our approach performs comparably to existing autoregressive models and, in some cases, surpasses similarly sized bidirectional models by avoiding constraints imposed by teacher architectures. For example, on VBench, our method achieves a total score of 84.92, closely matching state-of-the-art autoregressive methods that score 84.31 but require significant heterogeneous distillation.

Problem

Research questions and friction points this paper is trying to address.

autoregressive video generation

teacher model dependency

video generation quality

reward feedback

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reward-Forcing

autoregressive video generation

reward feedback

teacher-free training