From Preferences to Prejudice: The Role of Alignment Tuning in Shaping Social Bias in Video Diffusion Models

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Video diffusion models readily encode and amplify societal biases during human preference-based alignment fine-tuning. To address this, we propose VideoBiasEval—a first-of-its-kind framework enabling end-to-end bias tracing across preference data, reward modeling, and video generation. Methodologically, we design an event-driven prompting strategy grounded in a social bias taxonomy, and introduce semantic disentanglement coupled with multi-granularity temporal evaluation metrics to quantify the evolution of gender and racial biases across model variants and frame sequences. Experiments reveal that alignment fine-tuning improves generation fluency yet concurrently exacerbates representational bias, increases temporal stability of biased content, and intensifies visual stereotyping. We identify a critical “bias stabilization–visual smoothing co-occurrence” phenomenon, underscoring the necessity of bias-aware evaluation and intervention in video generation.

Technology Category

Application Category

📝 Abstract
Recent advances in video diffusion models have significantly enhanced text-to-video generation, particularly through alignment tuning using reward models trained on human preferences. While these methods improve visual quality, they can unintentionally encode and amplify social biases. To systematically trace how such biases evolve throughout the alignment pipeline, we introduce VideoBiasEval, a comprehensive diagnostic framework for evaluating social representation in video generation. Grounded in established social bias taxonomies, VideoBiasEval employs an event-based prompting strategy to disentangle semantic content (actions and contexts) from actor attributes (gender and ethnicity). It further introduces multi-granular metrics to evaluate (1) overall ethnicity bias, (2) gender bias conditioned on ethnicity, (3) distributional shifts in social attributes across model variants, and (4) the temporal persistence of bias within videos. Using this framework, we conduct the first end-to-end analysis connecting biases in human preference datasets, their amplification in reward models, and their propagation through alignment-tuned video diffusion models. Our results reveal that alignment tuning not only strengthens representational biases but also makes them temporally stable, producing smoother yet more stereotyped portrayals. These findings highlight the need for bias-aware evaluation and mitigation throughout the alignment process to ensure fair and socially responsible video generation.
Problem

Research questions and friction points this paper is trying to address.

Alignment tuning amplifies social biases in video diffusion models
VideoBiasEval framework evaluates bias evolution in video generation
Biases become temporally stable through human preference-based alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces VideoBiasEval diagnostic framework for bias evaluation
Uses event-based prompting to separate content from attributes
Employs multi-granular metrics to analyze bias propagation
🔎 Similar Papers
No similar papers found.
Zefan Cai
Zefan Cai
Student, Peking University
Inference AccelerationMulti-Modality
Haoyi Qiu
Haoyi Qiu
UCLA
Trustworthy AIMultimodality
H
Haozhe Zhao
University of Illinois Urbana-Champaign
K
Ke Wan
University of California, San Diego
J
Jiachen Li
University of California, Santa Barbara
Jiuxiang Gu
Jiuxiang Gu
Adobe Research
Computer VisionNatural Language ProcessingMachine Learning
W
Wen Xiao
Microsoft
N
Nanyun Peng
University of California, Los Angeles
J
Junjie Hu
University of Wisconsin–Madison