TAGRPO: Boosting GRPO on Image-to-Video Generation with Direct Trajectory Alignment

📅 2026-01-09
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing GRPO methods struggle to effectively leverage reward signals in image-to-video generation, leading to limitations in both visual quality and temporal consistency. This work proposes TAGRPO, a novel framework that introduces a trajectory alignment mechanism in the latent space. By generating rollout videos from shared initial noise to form positive and negative sample pairs, TAGRPO employs contrastive learning to reinforce high-reward trajectories while suppressing low-reward ones. Additionally, a video memory bank is incorporated to enhance sample diversity and training efficiency. Integrating Group Relative Policy Optimization, flow matching, and contrastive learning, TAGRPO significantly outperforms DanceGRPO on image-to-video generation tasks, producing videos with superior visual fidelity and improved temporal coherence.

Technology Category

Application Category

📝 Abstract
Recent studies have demonstrated the efficacy of integrating Group Relative Policy Optimization (GRPO) into flow matching models, particularly for text-to-image and text-to-video generation. However, we find that directly applying these techniques to image-to-video (I2V) models often fails to yield consistent reward improvements. To address this limitation, we present TAGRPO, a robust post-training framework for I2V models inspired by contrastive learning. Our approach is grounded in the observation that rollout videos generated from identical initial noise provide superior guidance for optimization. Leveraging this insight, we propose a novel GRPO loss applied to intermediate latents, encouraging direct alignment with high-reward trajectories while maximizing distance from low-reward counterparts. Furthermore, we introduce a memory bank for rollout videos to enhance diversity and reduce computational overhead. Despite its simplicity, TAGRPO achieves significant improvements over DanceGRPO in I2V generation.
Problem

Research questions and friction points this paper is trying to address.

image-to-video generation
Group Relative Policy Optimization
reward improvement
trajectory alignment
flow matching
Innovation

Methods, ideas, or system contributions that make the work stand out.

trajectory alignment
Group Relative Policy Optimization
image-to-video generation
contrastive learning
memory bank
🔎 Similar Papers
No similar papers found.
J
Jin Wang
The University of Hong Kong
J
Jianxiang Lu
Hunyuan, Tencent
G
Guangzheng Xu
Hunyuan, Tencent
C
Comi Chen
Hunyuan, Tencent
H
Haoyu Yang
Hunyuan, Tencent
L
Linqing Wang
Hunyuan, Tencent
P
Peng Chen
Hunyuan, Tencent
M
Mingtao Chen
Hunyuan, Tencent
Z
Zhichao Hu
Hunyuan, Tencent
L
Longhuang Wu
Hunyuan, Tencent
Shuai Shao
Shuai Shao
Tencent
Computer VisionMultimediaAIGC
Q
Qinglin Lu
Hunyuan, Tencent
Ping Luo
Ping Luo
National University of Defense Technology
distributed_computing