SCOPE: A Dataset of Stereotyped Prompts for Counterfactual Fairness Assessment of LLMs

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Existing fairness evaluation datasets for large language models often suffer from linguistic homogeneity, limited thematic coverage, and a neglect of communicative intent. To address these limitations, this work introduces SCOPE, a novel dataset that incorporates communicative intent as a new dimension within large-scale counterfactual prompt pairs. Constructed through a multidimensional control strategy, SCOPE comprises 241,280 prompts (120,640 counterfactual pairs) spanning 1,438 topics, nine bias dimensions, 1,536 demographic groups, and four types of communicative intent, ensuring semantic alignment across prompts. This resource provides a fine-grained, realistic benchmark for evaluating model fairness, robustness, and counterfactual consistency in contexts that closely mirror authentic human–AI interactions.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) now serve as the foundation for a wide range of applications, from conversational assistants to decision support tools, making the issue of fairness in their results increasingly important. Previous studies have shown that LLM outputs can shift when prompts reference different demographic groups, even when intent and semantic content remain constant. However, existing resources for probing such disparities rely primarily on small, template-based counterfactual examples or fixed sentence pairs. These benchmarks offer limited linguistic diversity, narrow topical coverage, and little support for analyzing how communicative intent affects model behavior. To address these limitations, we introduce SCOPE (Stereotype-COnditioned Prompts for Evaluation), a large-scale dataset of counterfactual prompt pairs designed to enable systematic investigation of group-sensitive behavior in LLMs. SCOPE contains 241,280 prompts organized into 120,640 counterfactual pairs, each grounded in one of 1,438 topics and spanning nine bias dimensions and 1,536 demographic groups. All prompts are generated under four distinct communicative intents: Question, Recommendation, Direction, and Clarification, ensuring broad coverage of common interaction styles. This resource provides a controlled, semantically aligned, and intent-aware basis for evaluating fairness, robustness, and counterfactual consistency.

Problem

Research questions and friction points this paper is trying to address.

counterfactual fairness

stereotyped prompts

large language models

demographic bias

communicative intent

Innovation

Methods, ideas, or system contributions that make the work stand out.

counterfactual fairness

stereotyped prompts

large language models