Neutral Prompts, Non-Neutral People: Quantifying Gender and Skin-Tone Bias in Gemini Flash 2.5 Image and GPT Image 1.5

📅 2026-02-12

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This study investigates whether state-of-the-art image generation models—Gemini Flash 2.5 and GPT Image 1.5—exhibit gender and skin tone biases even when prompted with semantically neutral text. To this end, we propose using neutral prompts as diagnostic probes and introduce a lighting-aware chromatic analysis framework that integrates hybrid color normalization, facial landmark masking, and perceptually uniform skin tone quantification based on the Monk, PERLA, and Fitzpatrick scales. Our systematic evaluation of 3,200 generated images reveals that both models produce over 96% “default White” outputs, yet display divergent gender biases: Gemini exhibits a strong female skew, whereas GPT favors light-skinned males. These findings demonstrate that neutral prompts fail to eliminate implicit biases and underscore the efficacy and necessity of our methodology for fairness assessment in generative vision systems.

Technology Category

Application Category

📝 Abstract

This study quantifies gender and skin-tone bias in two widely deployed commercial image generators - Gemini Flash 2.5 Image (NanoBanana) and GPT Image 1.5 - to test the assumption that neutral prompts yield demographically neutral outputs. We generated 3,200 photorealistic images using four semantically neutral prompts. The analysis employed a rigorous pipeline combining hybrid color normalization, facial landmark masking, and perceptually uniform skin tone quantification using the Monk (MST), PERLA, and Fitzpatrick scales. Neutral prompts produced highly polarized defaults. Both models exhibited a strong"default white"bias (>96% of outputs). However, they diverged sharply on gender: Gemini favored female-presenting subjects, while GPT favored male-presenting subjects with lighter skin tones. This research provides a large-scale, comparative audit of state-of-the-art models using an illumination-aware colorimetric methodology, distinguishing aesthetic rendering from underlying pigmentation in synthetic imagery. The study demonstrates that neutral prompts function as diagnostic probes rather than neutral instructions. It offers a robust framework for auditing algorithmic visual culture and challenges the sociolinguistic assumption that unmarked language results in inclusive representation.

Problem

Research questions and friction points this paper is trying to address.

gender bias

skin-tone bias

image generation

neutral prompts

algorithmic fairness

Innovation

Methods, ideas, or system contributions that make the work stand out.

bias auditing

skin tone quantification

neutral prompts