🤖 AI Summary
This work investigates the bidirectional interaction between name-based gender representations and occupational semantics in large language models (LLMs). To address this, we employ embedding-space similarity analysis, downstream occupational classification evaluation, statistical correlation testing, and bias sensitivity quantification. Our methodology systematically reveals how occupational stereotypes modulate name-level gender associations and, conversely, how such associations exacerbate occupational prediction bias. We provide the first empirical evidence that LLMs’ internal name embeddings exhibit significant alignment with real-world gender distributions and occupational co-occurrence patterns—specifically, feminized names substantially increase prediction probabilities for female-dominated occupations. Crucially, we propose embedding-level gender association as a novel, interpretable, and quantifiable internal proxy for diagnosing exogenous societal biases in LLMs, thereby enabling principled, measurement-driven bias assessment.
📝 Abstract
We examine LLM representations of gender for first names in various occupational contexts to study how occupations and the gender perception of first names in LLMs influence each other mutually. We find that LLMs' first-name gender representations correlate with real-world gender statistics associated with the name, and are influenced by the co-occurrence of stereotypically feminine or masculine occupations. Additionally, we study the influence of first-name gender representations on LLMs in a downstream occupation prediction task and their potential as an internal metric to identify extrinsic model biases. While feminine first-name embeddings often raise the probabilities for female-dominated jobs (and vice versa for male-dominated jobs), reliably using these internal gender representations for bias detection remains challenging.