Identity-GRPO: Optimizing Multi-Human Identity-preserving Video Generation via Reinforcement Learning

📅 2025-10-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing video generation methods (e.g., VACE, Phantom) struggle to maintain long-term identity consistency in dynamic multi-person interaction scenes. To address this, we propose Identity-GRPO—a novel framework that introduces Reinforcement Learning from Human Feedback (RLHF) to multi-person video generation for the first time. We construct a large-scale preference dataset explicitly designed for identity consistency and develop a GRPO variant tailored to this objective. Our method jointly leverages a video reward model and paired human annotations/synthetic distortion data to enable end-to-end optimization. Experiments demonstrate that Identity-GRPO achieves up to a 18.9% improvement over baselines on human-perceived identity consistency metrics. Ablation studies confirm the critical roles of both high-quality preference data and the customized GRPO architecture. This work establishes a new paradigm for modeling identity consistency in multi-person video generation.

Technology Category

Application Category

📝 Abstract
While advanced methods like VACE and Phantom have advanced video generation for specific subjects in diverse scenarios, they struggle with multi-human identity preservation in dynamic interactions, where consistent identities across multiple characters are critical. To address this, we propose Identity-GRPO, a human feedback-driven optimization pipeline for refining multi-human identity-preserving video generation. First, we construct a video reward model trained on a large-scale preference dataset containing human-annotated and synthetic distortion data, with pairwise annotations focused on maintaining human consistency throughout the video. We then employ a GRPO variant tailored for multi-human consistency, which greatly enhances both VACE and Phantom. Through extensive ablation studies, we evaluate the impact of annotation quality and design choices on policy optimization. Experiments show that Identity-GRPO achieves up to 18.9% improvement in human consistency metrics over baseline methods, offering actionable insights for aligning reinforcement learning with personalized video generation.
Problem

Research questions and friction points this paper is trying to address.

Optimizing multi-human identity preservation in video generation
Enhancing identity consistency across dynamic human interactions
Improving reinforcement learning for personalized video synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses reinforcement learning for identity preservation
Trains reward model with human-annotated preference data
Optimizes video generation via policy gradient methods
🔎 Similar Papers
No similar papers found.