SoliReward: Mitigating Susceptibility to Reward Hacking and Annotation Noise in Video Generation Reward Models

📅 2025-12-17

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Video generation reward models (RMs) suffer from sensitivity to annotation noise, adversarial reward attacks, and architectural limitations. To address these challenges, this paper proposes SoliReward—a robust RM framework. Methodologically: (i) it introduces a novel single-sample binary annotation scheme coupled with cross-prompt pairwise comparison, significantly reducing annotation cost while improving generalization; (ii) it designs a hierarchical progressive query attention mechanism to enhance vision-language alignment; and (iii) it revises the Bradley-Terry loss to accommodate win-tie outcomes and regularize the score distribution of positive samples. Evaluated across multi-dimensional benchmarks—including physical plausibility, subject deformation, and semantic alignment—SoliReward substantially improves RM robustness and evaluation metrics. Moreover, it effectively boosts post-training alignment performance of video generation models, demonstrating superior reliability under noisy or adversarial conditions.

Technology Category

Application Category

📝 Abstract

Post-training alignment of video generation models with human preferences is a critical goal. Developing effective Reward Models (RMs) for this process faces significant methodological hurdles. Current data collection paradigms, reliant on in-prompt pairwise annotations, suffer from labeling noise. Concurrently, the architectural design of VLM-based RMs, particularly their output mechanisms, remains underexplored. Furthermore, RM is susceptible to reward hacking in post-training. To mitigate these limitations, we propose SoliReward, a systematic framework for video RM training. Our framework first sources high-quality, cost-efficient data via single-item binary annotations, then constructs preference pairs using a cross-prompt pairing strategy. Architecturally, we employ a Hierarchical Progressive Query Attention mechanism to enhance feature aggregation. Finally, we introduce a modified BT loss that explicitly accommodates win-tie scenarios. This approach regularizes the RM's score distribution for positive samples, providing more nuanced preference signals to alleviate over-focus on a small number of top-scoring samples. Our approach is validated on benchmarks evaluating physical plausibility, subject deformity, and semantic alignment, demonstrating improvements in direct RM evaluation metrics and in the efficacy of post-training on video generation models. Code and benchmark will be publicly available.

Problem

Research questions and friction points this paper is trying to address.

Mitigates reward hacking in video generation post-training alignment

Addresses annotation noise in reward model data collection

Improves architectural design of vision-language reward models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Single-item binary annotations for data collection

Hierarchical Progressive Query Attention for features

Modified BT loss for win-tie regularization

🔎 Similar Papers

Detecting AI-Generated Video via Frame Consistency