Aligning Anime Video Generation with Human Feedback

📅 2025-04-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

219K/year
🤖 AI Summary
Anime video generation suffers from motion distortion, flickering artifacts, and misalignment with human preferences—largely due to data scarcity and inaccurate motion modeling. Existing reward models fail to capture anime-specific visual aesthetics and temporal coherence. To address this, we introduce the first multi-dimensional human feedback reward dataset for anime (30K samples), propose AnimeReward—the first anime-specialized reward model integrating vision-language joint representation—and design Gap-Aware Preference Optimization (GAPO), a novel algorithm that explicitly models and optimizes preference gaps. Experiments demonstrate that AnimeReward significantly outperforms state-of-the-art reward models in both quantitative metrics and human evaluation. GAPO substantially improves alignment efficiency and effectively mitigates flickering and motion inconsistency.

Technology Category

Application Category

📝 Abstract
Anime video generation faces significant challenges due to the scarcity of anime data and unusual motion patterns, leading to issues such as motion distortion and flickering artifacts, which result in misalignment with human preferences. Existing reward models, designed primarily for real-world videos, fail to capture the unique appearance and consistency requirements of anime. In this work, we propose a pipeline to enhance anime video generation by leveraging human feedback for better alignment. Specifically, we construct the first multi-dimensional reward dataset for anime videos, comprising 30k human-annotated samples that incorporating human preferences for both visual appearance and visual consistency. Based on this, we develop AnimeReward, a powerful reward model that employs specialized vision-language models for different evaluation dimensions to guide preference alignment. Furthermore, we introduce Gap-Aware Preference Optimization (GAPO), a novel training method that explicitly incorporates preference gaps into the optimization process, enhancing alignment performance and efficiency. Extensive experiment results show that AnimeReward outperforms existing reward models, and the inclusion of GAPO leads to superior alignment in both quantitative benchmarks and human evaluations, demonstrating the effectiveness of our pipeline in enhancing anime video quality. Our dataset and code will be publicly available.
Problem

Research questions and friction points this paper is trying to address.

Addresses motion distortion and flickering in anime videos
Improves alignment of anime generation with human preferences
Overcomes limitations of real-world video reward models for anime
Innovation

Methods, ideas, or system contributions that make the work stand out.

Constructs first anime video reward dataset
Develops AnimeReward model for preference alignment
Introduces GAPO method for optimization efficiency