FAIRT2V: Training-Free Debiasing for Text-to-Video Diffusion Models

📅 2026-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses gender bias in text-to-video (T2V) diffusion models, which primarily stems from implicit gender associations in pretrained text encoders—even for gender-neutral prompts. To mitigate this, we propose the first training-free, real-time debiasing framework that neutralizes bias in prompt embeddings via anchor-based spherical geodesic transformation and dynamically schedules this intervention during early denoising steps to preserve temporal consistency. We further establish the first video-level fairness evaluation protocol, combining VideoLLM-based metrics with human assessment, and reveal—through comprehensive analysis—that the text encoder is the primary source of T2V gender bias. Experiments on Open-Sora demonstrate that our method substantially reduces occupation-related gender bias while maintaining near-original generation quality.

Technology Category

Application Category

📝 Abstract
Text-to-video (T2V) diffusion models have achieved rapid progress, yet their demographic biases, particularly gender bias, remain largely unexplored. We present FairT2V, a training-free debiasing framework for text-to-video generation that mitigates encoder-induced bias without finetuning. We first analyze demographic bias in T2V models and show that it primarily originates from pretrained text encoders, which encode implicit gender associations even for neutral prompts. We quantify this effect with a gender-leaning score that correlates with bias in generated videos. Based on this insight, FairT2V mitigates demographic bias by neutralizing prompt embeddings via anchor-based spherical geodesic transformations while preserving semantics. To maintain temporal coherence, we apply debiasing only during early identity-forming steps through a dynamic denoising schedule. We further propose a video-level fairness evaluation protocol combining VideoLLM-based reasoning with human verification. Experiments on the modern T2V model Open-Sora show that FairT2V substantially reduces demographic bias across occupations with minimal impact on video quality.
Problem

Research questions and friction points this paper is trying to address.

demographic bias
gender bias
text-to-video diffusion models
fairness
prompt embeddings
Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free debiasing
text-to-video diffusion models
spherical geodesic transformation
temporal coherence
fairness evaluation
🔎 Similar Papers
No similar papers found.
H
Haonan Zhong
School of Computer Science and Engineering, University of New South Wales, Australia
W
Wei Song
School of Computer Science and Engineering, University of New South Wales, Australia
T
Tingxu Han
State Key Laboratory for Novel Software Technology, Nanjing University, China
M
M. Pagnucco
School of Computer Science and Engineering, University of New South Wales, Australia
Jingling Xue
Jingling Xue
IEEE Fellow (Computer Society), Scientia Professor, School of Computer Science and Engineering, UNSW
Programming LanguagesCompilersProgram Analysis
Yang Song
Yang Song
Associate Professor, University of New South Wales
Biomedical Image AnalysisComputer VisionMachine LearningArtificial Intelligence