🤖 AI Summary
This work addresses gender bias in text-to-video (T2V) diffusion models, which primarily stems from implicit gender associations in pretrained text encoders—even for gender-neutral prompts. To mitigate this, we propose the first training-free, real-time debiasing framework that neutralizes bias in prompt embeddings via anchor-based spherical geodesic transformation and dynamically schedules this intervention during early denoising steps to preserve temporal consistency. We further establish the first video-level fairness evaluation protocol, combining VideoLLM-based metrics with human assessment, and reveal—through comprehensive analysis—that the text encoder is the primary source of T2V gender bias. Experiments on Open-Sora demonstrate that our method substantially reduces occupation-related gender bias while maintaining near-original generation quality.
📝 Abstract
Text-to-video (T2V) diffusion models have achieved rapid progress, yet their demographic biases, particularly gender bias, remain largely unexplored. We present FairT2V, a training-free debiasing framework for text-to-video generation that mitigates encoder-induced bias without finetuning. We first analyze demographic bias in T2V models and show that it primarily originates from pretrained text encoders, which encode implicit gender associations even for neutral prompts. We quantify this effect with a gender-leaning score that correlates with bias in generated videos. Based on this insight, FairT2V mitigates demographic bias by neutralizing prompt embeddings via anchor-based spherical geodesic transformations while preserving semantics. To maintain temporal coherence, we apply debiasing only during early identity-forming steps through a dynamic denoising schedule. We further propose a video-level fairness evaluation protocol combining VideoLLM-based reasoning with human verification. Experiments on the modern T2V model Open-Sora show that FairT2V substantially reduces demographic bias across occupations with minimal impact on video quality.