🤖 AI Summary
Prior bias research in multimodal AI has largely overlooked the influence of linguistic structure—particularly grammatical gender—on visual representations generated by text-to-image (T2I) models. Method: We construct a cross-lingual benchmark covering five grammatical-gender languages and two grammatical-gender-neutral languages, and systematically evaluate three state-of-the-art T2I models, generating 28,800 images for quantitative analysis. Contribution/Results: We provide the first empirical evidence that grammatical gender induces systematic visual biases: masculine grammatical gender increases male representation to 73%, while feminine gender elevates female representation to 38%—both significantly deviating from the English gender-neutral baseline. We formally introduce “grammatical gender” as a critical new dimension of multimodal fairness, offering both theoretical grounding and empirical validation for how syntactic properties of language shape AI-generated visual content. This work bridges a longstanding gap in bias assessment by integrating linguistic typology into multimodal fairness evaluation.
📝 Abstract
Research on bias in Text-to-Image (T2I) models has primarily focused on demographic representation and stereotypical attributes, overlooking a fundamental question: how does grammatical gender influence visual representation across languages? We introduce a cross-linguistic benchmark examining words where grammatical gender contradicts stereotypical gender associations (e.g., ``une sentinelle'' - grammatically feminine in French but referring to the stereotypically masculine concept ``guard''). Our dataset spans five gendered languages (French, Spanish, German, Italian, Russian) and two gender-neutral control languages (English, Chinese), comprising 800 unique prompts that generated 28,800 images across three state-of-the-art T2I models. Our analysis reveals that grammatical gender dramatically influences image generation: masculine grammatical markers increase male representation to 73% on average (compared to 22% with gender-neutral English), while feminine grammatical markers increase female representation to 38% (compared to 28% in English). These effects vary systematically by language resource availability and model architecture, with high-resource languages showing stronger effects. Our findings establish that language structure itself, not just content, shapes AI-generated visual outputs, introducing a new dimension for understanding bias and fairness in multilingual, multimodal systems.