Refining Multidimensional Video Reward Models via Disentangled Influence Functions

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the challenge of dimensional heterogeneity in multidimensional video reward model training, where supervision signals for the same video exhibit markedly different reliability across evaluation dimensions, rendering conventional global scalar–based data filtering ineffective. The study formally characterizes this phenomenon for the first time and introduces a decoupled influence function framework that separately estimates supervision risk per dimension. Building on this, it devises dimension-decoupled pruning and reweighting strategies to enable fine-grained data refinement. By moving beyond the global filtering paradigm, the proposed approach substantially improves alignment between reward models and human preferences across multiple dimensions, outperforming existing baselines.

📝 Abstract

As Text-to-Video (T2V) generation models continue to evolve, the complexity of video evaluation necessitates a fine-grained assessment across various axes. To address this, recent works have focused on developing Multidimensional Video Reward Models (MVRMs), which decompose the evaluation process to better align with the multifaceted nature of human visual perception. However, training effective MVRMs is fundamentally challenged by the complex nature of video data. In this work, we identify a critical phenomenon termed Dimensional Heterogeneity: the reliability of a training sample can vary substantially across evaluation dimensions, meaning that a sample may provide reliable supervision for one objective while inducing high supervision risk for another. Consequently, prevailing data-centric methods that filter based on global scalar metrics are ill-posed for T2V tasks. To address this, we propose a disentangled influence framework that that efficiently estimates dimension-specific supervision risk. Leveraging this framework, we introduce two dimension-disentangled refinement strategies: Dimension-Disentangled Pruning, which removes extreme high-risk samples, and Dimension-Disentangled Reweighting, which softly down-weights high-risk supervision. Extensive experiments demonstrate that our disentangled strategies significantly outperform global filtering baselines, yielding reward models with superior alignment to ground truth.

Problem

Research questions and friction points this paper is trying to address.

Multidimensional Video Reward Models

Dimensional Heterogeneity

Text-to-Video generation

Supervision Risk

Video Evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dimensional Heterogeneity

Disentangled Influence Functions

Multidimensional Video Reward Models