Who Gets the Callback? Generative AI and Gender Bias

📅 2025-04-30
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
This study systematically audits gender bias in shortlisting decisions made by open-source large language models (LLMs). We construct a职业 classification and linguistic feature association framework using 332,000 real-world job descriptions, integrating the Standard Occupational Classification (SOC) system and Big Five personality trait injection—particularly the “Agreeableness” dimension—to quantify differential recommendations for equally qualified male versus female candidates. We uncover a previously undocumented systemic “Agreeableness bias”: LLMs consistently favor male candidates, especially for high-paying and male-dominated occupations, while over-recommending women for traditionally female-associated roles. Empirical evaluation demonstrates that personality-guided prompting—specifically simulating low Agreeableness—significantly mitigates stereotypical recommendations, reducing gender bias rates markedly. This work introduces a novel, interpretable methodology for assessing and intervening in LLM fairness, grounded in occupational taxonomy and psychometric modeling.

Technology Category

Application Category

📝 Abstract
Generative artificial intelligence (AI), particularly large language models (LLMs), is being rapidly deployed in recruitment and for candidate shortlisting. We audit several mid-sized open-source LLMs for gender bias using a dataset of 332,044 real-world online job postings. For each posting, we prompt the model to recommend whether an equally qualified male or female candidate should receive an interview callback. We find that most models tend to favor men, especially for higher-wage roles. Mapping job descriptions to the Standard Occupational Classification system, we find lower callback rates for women in male-dominated occupations and higher rates in female-associated ones, indicating occupational segregation. A comprehensive analysis of linguistic features in job ads reveals strong alignment of model recommendations with traditional gender stereotypes. To examine the role of recruiter identity, we steer model behavior by infusing Big Five personality traits and simulating the perspectives of historical figures. We find that less agreeable personas reduce stereotyping, consistent with an agreeableness bias in LLMs. Our findings highlight how AI-driven hiring may perpetuate biases in the labor market and have implications for fairness and diversity within firms.
Problem

Research questions and friction points this paper is trying to address.

Auditing gender bias in LLMs for recruitment decisions
Examining occupational segregation in AI callback recommendations
Analyzing linguistic stereotypes in job ads affecting AI fairness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Audit LLMs for gender bias using job postings
Map job descriptions to occupational classifications
Steer model behavior with personality traits
🔎 Similar Papers
No similar papers found.