🤖 AI Summary
This study investigates disparities in social perspective distribution—perpetrator, victim, and decision-maker—between humans and large language models (LLMs) when judging implicit gender bias. Methodologically, it employs prompt engineering to elicit perspective-specific responses, human annotation for ground-truth labeling, and cross-model statistical comparison to systematically quantify perspective selection patterns in subjective judgment tasks. Results reveal that all three perspectives are consistently present in both human and LLM responses, yet their distributions differ significantly across groups; inter-model variation in perspective composition substantially exceeds intra-model or individual human variability. Building on these findings, the paper proposes “perspective diversity” as a novel dimension for evaluating LLM task suitability, particularly in fairness-sensitive decision-making contexts. This framework provides empirically grounded criteria and methodological guidance for model selection in sociotechnical applications requiring equitable judgment.
📝 Abstract
In subjective decision-making, where decisions are based on contextual interpretation, Large Language Models (LLMs) can be integrated to present users with additional rationales to consider. The diversity of these rationales is mediated by the ability to consider the perspectives of different social actors. However, it remains unclear whether and how models differ in the distribution of perspectives they provide. We compare the perspectives taken by humans and different LLMs when assessing subtle sexism scenarios. We show that these perspectives can be classified within a finite set (perpetrator, victim, decision-maker), consistently present in argumentations produced by humans and LLMs, but in different distributions and combinations, demonstrating differences and similarities with human responses, and between models. We argue for the need to systematically evaluate LLMs' perspective-taking to identify the most suitable models for a given decision-making task. We discuss the implications for model evaluation.