TAGRPO: Boosting GRPO on Image-to-Video Generation with Direct Trajectory Alignment

📅 2026-01-09

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Existing GRPO methods struggle to effectively leverage reward signals in image-to-video generation, leading to limitations in both visual quality and temporal consistency. This work proposes TAGRPO, a novel framework that introduces a trajectory alignment mechanism in the latent space. By generating rollout videos from shared initial noise to form positive and negative sample pairs, TAGRPO employs contrastive learning to reinforce high-reward trajectories while suppressing low-reward ones. Additionally, a video memory bank is incorporated to enhance sample diversity and training efficiency. Integrating Group Relative Policy Optimization, flow matching, and contrastive learning, TAGRPO significantly outperforms DanceGRPO on image-to-video generation tasks, producing videos with superior visual fidelity and improved temporal coherence.

Technology Category

Application Category

📝 Abstract

Recent studies have demonstrated the efficacy of integrating Group Relative Policy Optimization (GRPO) into flow matching models, particularly for text-to-image and text-to-video generation. However, we find that directly applying these techniques to image-to-video (I2V) models often fails to yield consistent reward improvements. To address this limitation, we present TAGRPO, a robust post-training framework for I2V models inspired by contrastive learning. Our approach is grounded in the observation that rollout videos generated from identical initial noise provide superior guidance for optimization. Leveraging this insight, we propose a novel GRPO loss applied to intermediate latents, encouraging direct alignment with high-reward trajectories while maximizing distance from low-reward counterparts. Furthermore, we introduce a memory bank for rollout videos to enhance diversity and reduce computational overhead. Despite its simplicity, TAGRPO achieves significant improvements over DanceGRPO in I2V generation.

Problem

Research questions and friction points this paper is trying to address.

image-to-video generation

Group Relative Policy Optimization

reward improvement

trajectory alignment

flow matching

Innovation

Methods, ideas, or system contributions that make the work stand out.

trajectory alignment

Group Relative Policy Optimization

image-to-video generation

contrastive learning

memory bank

🔎 Similar Papers

MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance

2024-06-28arXiv.orgCitations: 85

Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion

2024-08-01arXiv.orgCitations: 4