Improving Video Generation with Human Feedback

📅 2025-01-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address motion discontinuity and poor text-video alignment in video generation, this work constructs a large-scale, multi-dimensional human preference dataset and proposes VideoReward—the first video-specific, multi-dimensional reward model. It also introduces human feedback into flow-matching-based video generation for the first time. We further design three reinforcement learning (RL) alignment algorithms: Flow-DPO, a training-time optimization method based on direct preference optimization tailored to flow models; Flow-RWR, which employs reward-weighted regression; and Flow-NRG, an inference-time mechanism enabling multi-objective, interpretable quality control via noise-level reward guidance. Flow-DPO is the first DPO variant adapted to flow models, while Flow-NRG introduces the novel concept of noise-level reward guidance. Experiments demonstrate that VideoReward significantly outperforms existing video reward models; Flow-DPO surpasses supervised fine-tuning and Flow-RWR in motion coherence and semantic consistency; and Flow-NRG enables flexible, user-controllable, and interpretable quality modulation.

Technology Category

Application Category

📝 Abstract
Video generation has achieved significant advances through rectified flow techniques, but issues like unsmooth motion and misalignment between videos and prompts persist. In this work, we develop a systematic pipeline that harnesses human feedback to mitigate these problems and refine the video generation model. Specifically, we begin by constructing a large-scale human preference dataset focused on modern video generation models, incorporating pairwise annotations across multi-dimensions. We then introduce VideoReward, a multi-dimensional video reward model, and examine how annotations and various design choices impact its rewarding efficacy. From a unified reinforcement learning perspective aimed at maximizing reward with KL regularization, we introduce three alignment algorithms for flow-based models by extending those from diffusion models. These include two training-time strategies: direct preference optimization for flow (Flow-DPO) and reward weighted regression for flow (Flow-RWR), and an inference-time technique, Flow-NRG, which applies reward guidance directly to noisy videos. Experimental results indicate that VideoReward significantly outperforms existing reward models, and Flow-DPO demonstrates superior performance compared to both Flow-RWR and standard supervised fine-tuning methods. Additionally, Flow-NRG lets users assign custom weights to multiple objectives during inference, meeting personalized video quality needs. Project page: https://gongyeliu.github.io/videoalign.
Problem

Research questions and friction points this paper is trying to address.

Video Production
Action Smoothness
Content-Prompt Alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

VideoQualityOptimization
HumanFeedback
PersonalizedAdjustment
J
Jie Liu
The Chinese University of Hong Kong, Shanghai AI Laboratory
Gongye Liu
Gongye Liu
The Hong Kong University of Science and Technology
Image/Video Generation
Jiajun Liang
Jiajun Liang
MEGVII, KLINGAI
Deep LearningComputer VisionVision-Language LearningGenerative Model
Z
Ziyang Yuan
Tsinghua University, Kuaishou Technology
X
Xiaokun Liu
Kuaishou Technology
Mingwu Zheng
Mingwu Zheng
Seed Edge
Diffusion Models3D Rendering & Modeling
X
Xiele Wu
Kuaishou Technology, Shanghai Jiao Tong University
Q
Qiulin Wang
Kuaishou Technology
Wenyu Qin
Wenyu Qin
Harbin Institute of Technology
Control
M
Menghan Xia
Kuaishou Technology
X
Xintao Wang
Kuaishou Technology
X
Xiaohong Liu
Shanghai Jiao Tong University
F
Fei Yang
Kuaishou Technology
Pengfei Wan
Pengfei Wan
Head of Kling Video Generation Models, Kuaishou Technology
Generative ModelsComputer VisionMultimodal AIComputer Graphics
D
Di Zhang
Kuaishou Technology
Kun Gai
Kun Gai
Senior Director & Researcher, Alibaba Group
Machine LearningComputational Advertising
Yujiu Yang
Yujiu Yang
SIGS, Tsinghua University
Machine Learning, Nature language processing, Computer vision
W
Wanli Ouyang
The Chinese University of Hong Kong, Shanghai AI Laboratory