Aligning Anime Video Generation with Human Feedback

📅 2025-04-14

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Anime video generation suffers from motion distortion, flickering artifacts, and misalignment with human preferences—largely due to data scarcity and inaccurate motion modeling. Existing reward models fail to capture anime-specific visual aesthetics and temporal coherence. To address this, we introduce the first multi-dimensional human feedback reward dataset for anime (30K samples), propose AnimeReward—the first anime-specialized reward model integrating vision-language joint representation—and design Gap-Aware Preference Optimization (GAPO), a novel algorithm that explicitly models and optimizes preference gaps. Experiments demonstrate that AnimeReward significantly outperforms state-of-the-art reward models in both quantitative metrics and human evaluation. GAPO substantially improves alignment efficiency and effectively mitigates flickering and motion inconsistency.

Technology Category

Application Category

📝 Abstract

Anime video generation faces significant challenges due to the scarcity of anime data and unusual motion patterns, leading to issues such as motion distortion and flickering artifacts, which result in misalignment with human preferences. Existing reward models, designed primarily for real-world videos, fail to capture the unique appearance and consistency requirements of anime. In this work, we propose a pipeline to enhance anime video generation by leveraging human feedback for better alignment. Specifically, we construct the first multi-dimensional reward dataset for anime videos, comprising 30k human-annotated samples that incorporating human preferences for both visual appearance and visual consistency. Based on this, we develop AnimeReward, a powerful reward model that employs specialized vision-language models for different evaluation dimensions to guide preference alignment. Furthermore, we introduce Gap-Aware Preference Optimization (GAPO), a novel training method that explicitly incorporates preference gaps into the optimization process, enhancing alignment performance and efficiency. Extensive experiment results show that AnimeReward outperforms existing reward models, and the inclusion of GAPO leads to superior alignment in both quantitative benchmarks and human evaluations, demonstrating the effectiveness of our pipeline in enhancing anime video quality. Our dataset and code will be publicly available.

Problem

Research questions and friction points this paper is trying to address.

Addresses motion distortion and flickering in anime videos

Improves alignment of anime generation with human preferences

Overcomes limitations of real-world video reward models for anime

Innovation

Methods, ideas, or system contributions that make the work stand out.

Constructs first anime video reward dataset

Develops AnimeReward model for preference alignment

Introduces GAPO method for optimization efficiency

🔎 Similar Papers

Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation

2024-08-29arXiv.orgCitations: 7

Nvidia

The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5. You will also be eligible for equity and benefits.

US, CA, Remote / US, WA, Remote / US, OR, Remote

AI Research Scientist, Computer Vision - Facebook Video Intelligence