đ¤ AI Summary
This study systematically audits gender bias in shortlisting decisions made by open-source large language models (LLMs). We construct ačä¸ classification and linguistic feature association framework using 332,000 real-world job descriptions, integrating the Standard Occupational Classification (SOC) system and Big Five personality trait injectionâparticularly the âAgreeablenessâ dimensionâto quantify differential recommendations for equally qualified male versus female candidates. We uncover a previously undocumented systemic âAgreeableness biasâ: LLMs consistently favor male candidates, especially for high-paying and male-dominated occupations, while over-recommending women for traditionally female-associated roles. Empirical evaluation demonstrates that personality-guided promptingâspecifically simulating low Agreeablenessâsignificantly mitigates stereotypical recommendations, reducing gender bias rates markedly. This work introduces a novel, interpretable methodology for assessing and intervening in LLM fairness, grounded in occupational taxonomy and psychometric modeling.
đ Abstract
Generative artificial intelligence (AI), particularly large language models (LLMs), is being rapidly deployed in recruitment and for candidate shortlisting. We audit several mid-sized open-source LLMs for gender bias using a dataset of 332,044 real-world online job postings. For each posting, we prompt the model to recommend whether an equally qualified male or female candidate should receive an interview callback. We find that most models tend to favor men, especially for higher-wage roles. Mapping job descriptions to the Standard Occupational Classification system, we find lower callback rates for women in male-dominated occupations and higher rates in female-associated ones, indicating occupational segregation. A comprehensive analysis of linguistic features in job ads reveals strong alignment of model recommendations with traditional gender stereotypes. To examine the role of recruiter identity, we steer model behavior by infusing Big Five personality traits and simulating the perspectives of historical figures. We find that less agreeable personas reduce stereotyping, consistent with an agreeableness bias in LLMs. Our findings highlight how AI-driven hiring may perpetuate biases in the labor market and have implications for fairness and diversity within firms.