Biases Propagate in Encoder-based Vision-Language Models: A Systematic Analysis From Intrinsic Measures to Zero-shot Retrieval Outcomes

📅 2025-06-06

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This study investigates how inherent social group biases in encoder-based vision-language models (VLMs) propagate to downstream zero-shot text-to-image (TTI) and image-to-text (ITT) retrieval tasks. We propose the first analytical framework quantifying the association between intrinsic representational bias and extrinsic retrieval bias, grounded in correlation-based metrics and instantiated via a standardized evaluation protocol across three VLMs, six socially defined demographic groups, and bidirectional retrieval. Experimental results reveal a strong correlation between intrinsic and extrinsic bias (Spearman’s ρ = 0.83 ± 0.10 over 114 configurations); larger model scale or higher retrieval accuracy exacerbates bias propagation; and bias propagation is markedly less stable for underrepresented groups, severely undermining fairness. Based on these findings, we introduce a novel benchmark task targeting group–sentiment signal propagation, offering a reproducible methodology and empirical foundation for fairness research in VLMs.

Technology Category

Application Category

📝 Abstract

To build fair AI systems we need to understand how social-group biases intrinsic to foundational encoder-based vision-language models (VLMs) manifest in biases in downstream tasks. In this study, we demonstrate that intrinsic biases in VLM representations systematically ``carry over'' or propagate into zero-shot retrieval tasks, revealing how deeply rooted biases shape a model's outputs. We introduce a controlled framework to measure this propagation by correlating (a) intrinsic measures of bias in the representational space with (b) extrinsic measures of bias in zero-shot text-to-image (TTI) and image-to-text (ITT) retrieval. Results show substantial correlations between intrinsic and extrinsic bias, with an average $ ho$ = 0.83 $pm$ 0.10. This pattern is consistent across 114 analyses, both retrieval directions, six social groups, and three distinct VLMs. Notably, we find that larger/better-performing models exhibit greater bias propagation, a finding that raises concerns given the trend towards increasingly complex AI models. Our framework introduces baseline evaluation tasks to measure the propagation of group and valence signals. Investigations reveal that underrepresented groups experience less robust propagation, further skewing their model-related outcomes.

Problem

Research questions and friction points this paper is trying to address.

Analyzing bias propagation in vision-language models

Measuring intrinsic and extrinsic bias correlations

Evaluating bias impact on underrepresented groups

Innovation

Methods, ideas, or system contributions that make the work stand out.

Measure bias propagation in VLMs

Correlate intrinsic and extrinsic bias measures

Evaluate bias across social groups

🔎 Similar Papers

Refining Skewed Perceptions in Vision-Language Models through Visual Representations