SCOPE: A Dataset of Stereotyped Prompts for Counterfactual Fairness Assessment of LLMs

πŸ“… 2026-04-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing fairness evaluation datasets for large language models often suffer from linguistic homogeneity, limited thematic coverage, and a neglect of communicative intent. To address these limitations, this work introduces SCOPE, a novel dataset that incorporates communicative intent as a new dimension within large-scale counterfactual prompt pairs. Constructed through a multidimensional control strategy, SCOPE comprises 241,280 prompts (120,640 counterfactual pairs) spanning 1,438 topics, nine bias dimensions, 1,536 demographic groups, and four types of communicative intent, ensuring semantic alignment across prompts. This resource provides a fine-grained, realistic benchmark for evaluating model fairness, robustness, and counterfactual consistency in contexts that closely mirror authentic human–AI interactions.
πŸ“ Abstract
Large Language Models (LLMs) now serve as the foundation for a wide range of applications, from conversational assistants to decision support tools, making the issue of fairness in their results increasingly important. Previous studies have shown that LLM outputs can shift when prompts reference different demographic groups, even when intent and semantic content remain constant. However, existing resources for probing such disparities rely primarily on small, template-based counterfactual examples or fixed sentence pairs. These benchmarks offer limited linguistic diversity, narrow topical coverage, and little support for analyzing how communicative intent affects model behavior. To address these limitations, we introduce SCOPE (Stereotype-COnditioned Prompts for Evaluation), a large-scale dataset of counterfactual prompt pairs designed to enable systematic investigation of group-sensitive behavior in LLMs. SCOPE contains 241,280 prompts organized into 120,640 counterfactual pairs, each grounded in one of 1,438 topics and spanning nine bias dimensions and 1,536 demographic groups. All prompts are generated under four distinct communicative intents: Question, Recommendation, Direction, and Clarification, ensuring broad coverage of common interaction styles. This resource provides a controlled, semantically aligned, and intent-aware basis for evaluating fairness, robustness, and counterfactual consistency.
Problem

Research questions and friction points this paper is trying to address.

counterfactual fairness
stereotyped prompts
large language models
demographic bias
communicative intent
Innovation

Methods, ideas, or system contributions that make the work stand out.

counterfactual fairness
stereotyped prompts
large language models
communicative intent
bias evaluation
πŸ”Ž Similar Papers
No similar papers found.