Bias, Accuracy, and Trust: Gender-Diverse Perspectives on Large Language Models

📅 2025-06-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates gender-diverse users’ (including non-binary, transgender, male, and female individuals) differential perceptions of bias, accuracy, and trustworthiness in large language models (LLMs) such as ChatGPT. Employing 25 semi-structured in-depth interviews, thematic coding, controlled experiments comparing gendered versus gender-neutral prompts, and a structured user experience evaluation framework, the work provides the first systematic empirical evidence that non-binary users encounter stereotypical and patronizing responses more frequently than binary users. Crucially, trust exhibits a pronounced gendered divergence: non-binary participants report higher confidence in model performance yet significantly lower perceived respectfulness in model outputs. The findings challenge the “technological neutrality” assumption and propose three actionable interventions—enhancing gender diversity in training data, improving response depth and balance across gender identities, and integrating proactive clarification mechanisms—to advance AI fairness through a pluralistic gender lens. This work contributes both empirical grounding and methodological guidance for inclusive LLM design.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) are becoming increasingly ubiquitous in our daily lives, but numerous concerns about bias in LLMs exist. This study examines how gender-diverse populations perceive bias, accuracy, and trustworthiness in LLMs, specifically ChatGPT. Through 25 in-depth interviews with non-binary/transgender, male, and female participants, we investigate how gendered and neutral prompts influence model responses and how users evaluate these responses. Our findings reveal that gendered prompts elicit more identity-specific responses, with non-binary participants particularly susceptible to condescending and stereotypical portrayals. Perceived accuracy was consistent across gender groups, with errors most noted in technical topics and creative tasks. Trustworthiness varied by gender, with men showing higher trust, especially in performance, and non-binary participants demonstrating higher performance-based trust. Additionally, participants suggested improving the LLMs by diversifying training data, ensuring equal depth in gendered responses, and incorporating clarifying questions. This research contributes to the CSCW/HCI field by highlighting the need for gender-diverse perspectives in LLM development in particular and AI in general, to foster more inclusive and trustworthy systems.
Problem

Research questions and friction points this paper is trying to address.

Examining gender-diverse perceptions of bias in ChatGPT responses.
Assessing accuracy consistency across genders in technical and creative tasks.
Exploring trust variations in LLMs among different gender groups.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Conducts gender-diverse interviews on LLM bias
Analyzes gendered vs neutral prompt impacts
Proposes diversifying training data for inclusivity
🔎 Similar Papers
No similar papers found.