FAIRT2V: Training-Free Debiasing for Text-to-Video Diffusion Models

📅 2026-01-28

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses gender bias in text-to-video (T2V) diffusion models, which primarily stems from implicit gender associations in pretrained text encoders—even for gender-neutral prompts. To mitigate this, we propose the first training-free, real-time debiasing framework that neutralizes bias in prompt embeddings via anchor-based spherical geodesic transformation and dynamically schedules this intervention during early denoising steps to preserve temporal consistency. We further establish the first video-level fairness evaluation protocol, combining VideoLLM-based metrics with human assessment, and reveal—through comprehensive analysis—that the text encoder is the primary source of T2V gender bias. Experiments on Open-Sora demonstrate that our method substantially reduces occupation-related gender bias while maintaining near-original generation quality.

Technology Category

Application Category

📝 Abstract

Text-to-video (T2V) diffusion models have achieved rapid progress, yet their demographic biases, particularly gender bias, remain largely unexplored. We present FairT2V, a training-free debiasing framework for text-to-video generation that mitigates encoder-induced bias without finetuning. We first analyze demographic bias in T2V models and show that it primarily originates from pretrained text encoders, which encode implicit gender associations even for neutral prompts. We quantify this effect with a gender-leaning score that correlates with bias in generated videos. Based on this insight, FairT2V mitigates demographic bias by neutralizing prompt embeddings via anchor-based spherical geodesic transformations while preserving semantics. To maintain temporal coherence, we apply debiasing only during early identity-forming steps through a dynamic denoising schedule. We further propose a video-level fairness evaluation protocol combining VideoLLM-based reasoning with human verification. Experiments on the modern T2V model Open-Sora show that FairT2V substantially reduces demographic bias across occupations with minimal impact on video quality.

Problem

Research questions and friction points this paper is trying to address.

demographic bias

gender bias

text-to-video diffusion models

fairness

prompt embeddings

Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free debiasing

text-to-video diffusion models

spherical geodesic transformation

temporal coherence

fairness evaluation

🔎 Similar Papers

Rethinking Training for De-biasing Text-to-Image Generation: Unlocking the Potential of Stable Diffusion

2024-08-22Citations: 1