Ask Me Again Differently: GRAS for Measuring Bias in Vision Language Models on Gender, Race, Age, and Skin Tone

📅 2025-08-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing vision-language models (VLMs) exhibit implicit biases along demographic dimensions—including gender, race, age, and skin tone—but lack systematic, quantitative methods to measure such biases. Method: We propose GRAS, a comprehensive benchmark grounded in the visual question answering (VQA) paradigm. GRAS introduces a diverse, large-scale collection of image–question pairs covering the broadest spectrum of demographic attributes to date, coupled with a novel multi-question variant evaluation protocol. It integrates human verification with automated scoring to ensure robustness and reliability. Contribution/Results: Evaluated on five state-of-the-art VLMs, GRAS reveals pervasive and severe demographic bias—lowest observed bias scores are merely 2/100. GRAS is the first framework enabling fine-grained, cross-dimensional, and fully reproducible bias quantification for VLMs. All code, data, and evaluation results are publicly released, establishing a standardized, open benchmark for fairness research in vision-language modeling.

Technology Category

Application Category

📝 Abstract
As Vision Language Models (VLMs) become integral to real-world applications, understanding their demographic biases is critical. We introduce GRAS, a benchmark for uncovering demographic biases in VLMs across gender, race, age, and skin tone, offering the most diverse coverage to date. We further propose the GRAS Bias Score, an interpretable metric for quantifying bias. We benchmark five state-of-the-art VLMs and reveal concerning bias levels, with the least biased model attaining a GRAS Bias Score of only 2 out of 100. Our findings also reveal a methodological insight: evaluating bias in VLMs with visual question answering (VQA) requires considering multiple formulations of a question. Our code, data, and evaluation results are publicly available.
Problem

Research questions and friction points this paper is trying to address.

Measuring demographic bias in Vision Language Models
Evaluating bias across gender, race, age and skin tone
Developing interpretable metrics for quantifying VLM bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

GRAS benchmark for demographic bias evaluation
GRAS Bias Score interpretable metric
Multiple question formulations for VQA assessment
🔎 Similar Papers
No similar papers found.
S
Shaivi Malik
AI Institute, University of South Carolina, USA
H
Hasnat Md Abdullah
AI Institute, University of South Carolina, USA, Texas A&M University, USA
S
Sriparna Saha
IIT Patna, India
Amit Sheth
Amit Sheth
NCR Chair & Prof.; Founding Director, AI Institute; U. of South Carolina
Neurosymbolic AIKnowledge GraphKnowledge-infused LearningSemantic WebArtificial Intelligence