IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance

📅 2025-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Quantifying issue bias in large language models (LLMs) during authentic user interactions remains challenging due to the lack of realistic, scalable benchmarks. Method: We introduce IssueBench—the first real-world scenario-driven benchmark for issue bias evaluation—constructed from 2.49 million real user prompts, covering 212 political issues and 3,900 writing templates. Our methodology features novel template-issue combinatorial generation, statistical bias modeling, cross-model consistency analysis, and partisan alignment assessment relative to U.S. voter stances. Contribution/Results: Experiments reveal stable, pervasive issue bias across mainstream LLMs, with systematic alignment toward U.S. Democratic voter positions. IssueBench establishes the first large-scale, reusable, multidimensional framework for quantifying issue bias, enabling standardized evaluation for bias detection, alignment optimization, and policy impact assessment.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) are helping millions of users write texts about diverse issues, and in doing so expose users to different ideas and perspectives. This creates concerns about issue bias, where an LLM tends to present just one perspective on a given issue, which in turn may influence how users think about this issue. So far, it has not been possible to measure which issue biases LLMs actually manifest in real user interactions, making it difficult to address the risks from biased LLMs. Therefore, we create IssueBench: a set of 2.49m realistic prompts for measuring issue bias in LLM writing assistance, which we construct based on 3.9k templates (e.g."write a blog about") and 212 political issues (e.g."AI regulation") from real user interactions. Using IssueBench, we show that issue biases are common and persistent in state-of-the-art LLMs. We also show that biases are remarkably similar across models, and that all models align more with US Democrat than Republican voter opinion on a subset of issues. IssueBench can easily be adapted to include other issues, templates, or tasks. By enabling robust and realistic measurement, we hope that IssueBench can bring a new quality of evidence to ongoing discussions about LLM biases and how to address them.
Problem

Research questions and friction points this paper is trying to address.

Measures issue bias in LLM writing assistance
Identifies persistent biases across state-of-the-art LLMs
Assesses alignment with US political voter opinions
Innovation

Methods, ideas, or system contributions that make the work stand out.

IssueBench measures LLM biases
Uses 2.49m realistic prompts
Based on 3.9k templates
🔎 Similar Papers