🤖 AI Summary
This study identifies systemic biases in large language models’ (LLMs) representation of social identities (e.g., gender, race): due to training data limitations and architectural constraints, LLMs produce misrepresentation, flattening, and essentialization—failing to capture identity’s contextual situatedness and individual heterogeneity. Through theoretical analysis, large-scale human comparative experiments (N = 3,200 participants across 16 identity categories), and inference-time interventions (e.g., identity-aware prompt tuning), we empirically demonstrate that four state-of-the-art LLMs significantly distort identity expression. While proposed interventions partially mitigate bias, they do not eliminate it. Our work introduces the first systematic framework for characterizing harms arising from LLMs’ identity representation, explicitly linking these harms to historical injustices experienced by marginalized groups. We caution against deploying LLMs as substitutes for human-centered, identity-sensitive methodologies in social science research and practice.
📝 Abstract
Large language models (LLMs) are increasing in capability and popularity, propelling their application in new domains -- including as replacements for human participants in computational social science, user testing, annotation tasks, and more. In many settings, researchers seek to distribute their surveys to a sample of participants that are representative of the underlying human population of interest. This means in order to be a suitable replacement, LLMs will need to be able to capture the influence of positionality (i.e., relevance of social identities like gender and race). However, we show that there are two inherent limitations in the way current LLMs are trained that prevent this. We argue analytically for why LLMs are likely to both misportray and flatten the representations of demographic groups, then empirically show this on 4 LLMs through a series of human studies with 3200 participants across 16 demographic identities. We also discuss a third limitation about how identity prompts can essentialize identities. Throughout, we connect each limitation to a pernicious history that explains why it is harmful for marginalized demographic groups. Overall, we urge caution in use cases where LLMs are intended to replace human participants whose identities are relevant to the task at hand. At the same time, in cases where the goal is to supplement rather than replace (e.g., pilot studies), we provide inference-time techniques that we empirically demonstrate do reduce, but do not remove, these harms.