An Empirical Investigation of Gender Stereotype Representation in Large Language Models: The Italian Case

📅 2025-07-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates implicit gender–occupation stereotyping in Italian by large language models (LLMs), leveraging Italian’s rich grammatical gender marking. We design structured, gender-neutral prompts and collect 3,600 occupational responses from ChatGPT and Gemini via their APIs. Responses are analyzed using a hierarchical occupation classification framework to quantify model-generated gendered pronoun associations. This constitutes the first systematic evaluation of gender bias in mainstream LLMs for a highly inflected, non-English language. Results reveal strong, statistically significant associations between low-status occupations (e.g., “assistant”) and feminine pronouns—100% for Gemini and 97% for ChatGPT—demonstrating that LLMs amplify sociocultural stereotypes in morphologically gendered languages. These findings expose critical gaps in current multilingual alignment and fairness optimization strategies. The work provides empirical evidence and a methodological framework for cross-lingual bias assessment and mitigation in LLMs.

Technology Category

Application Category

📝 Abstract
The increasing use of Large Language Models (LLMs) in a large variety of domains has sparked worries about how easily they can perpetuate stereotypes and contribute to the generation of biased content. With a focus on gender and professional bias, this work examines in which manner LLMs shape responses to ungendered prompts, contributing to biased outputs. This analysis uses a structured experimental method, giving different prompts involving three different professional job combinations, which are also characterized by a hierarchical relationship. This study uses Italian, a language with extensive grammatical gender differences, to highlight potential limitations in current LLMs' ability to generate objective text in non-English languages. Two popular LLM-based chatbots are examined, namely OpenAI ChatGPT (gpt-4o-mini) and Google Gemini (gemini-1.5-flash). Through APIs, we collected a range of 3600 responses. The results highlight how content generated by LLMs can perpetuate stereotypes. For example, Gemini associated 100% (ChatGPT 97%) of 'she' pronouns to the 'assistant' rather than the 'manager'. The presence of bias in AI-generated text can have significant implications in many fields, such as in the workplaces or in job selections, raising ethical concerns about its use. Understanding these risks is pivotal to developing mitigation strategies and assuring that AI-based systems do not increase social inequalities, but rather contribute to more equitable outcomes. Future research directions include expanding the study to additional chatbots or languages, refining prompt engineering methods or further exploiting a larger experimental base.
Problem

Research questions and friction points this paper is trying to address.

Investigates gender stereotype representation in Italian LLMs
Examines bias in LLM responses to ungendered professional prompts
Assesses ethical risks of AI-generated biased workplace content
Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured experimental method with professional job combinations
Analysis of Italian language for gender bias detection
Comparison of OpenAI ChatGPT and Google Gemini outputs