Systematic Bias in Large Language Models: Discrepant Response Patterns in Binary vs. Continuous Judgment Tasks

📅 2025-04-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study identifies a systematic negative judgment bias in large language models (LLMs) induced by response format—binary versus continuous—challenging the implicit assumption that model outputs depend solely on input. Method: Through controlled experiments across multiple open-source and commercial LLMs using rigorous prompt engineering, we evaluate this effect on value judgments and text sentiment analysis. Contribution/Results: Binary-format responses exhibit significantly higher negative classification rates than continuous formats—by 12.3%–18.7% on average—with high consistency across models and tasks. This is the first empirical demonstration that task framing alone can introduce reproducible, systematic bias in LLM outputs. The findings establish response format as a critical, often overlooked design variable in LLM-based decision-making applications, particularly in high-stakes domains such as psychological text analysis, where reliability and calibration are essential.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are increasingly used in tasks such as psychological text analysis and decision-making in automated workflows. However, their reliability remains a concern due to potential biases inherited from their training process. In this study, we examine how different response format: binary versus continuous, may systematically influence LLMs' judgments. In a value statement judgments task and a text sentiment analysis task, we prompted LLMs to simulate human responses and tested both formats across several models, including both open-source and commercial models. Our findings revealed a consistent negative bias: LLMs were more likely to deliver"negative"judgments in binary formats compared to continuous ones. Control experiments further revealed that this pattern holds across both tasks. Our results highlight the importance of considering response format when applying LLMs to decision tasks, as small changes in task design can introduce systematic biases.
Problem

Research questions and friction points this paper is trying to address.

Examines bias in LLMs between binary and continuous response formats
Identifies consistent negative bias in binary judgments across tasks
Highlights impact of response format on LLM decision reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Examined binary vs continuous response formats
Revealed consistent negative bias in binary judgments
Highlighted response format impact on LLM biases
🔎 Similar Papers
No similar papers found.
Yi-Long Lu
Yi-Long Lu
Peking University
decision makingproblem solvingcomputational modeling
C
Chunhui Zhang
State Key Laboratory of General Artificial Intelligence, BIGAI
W
Wei Wang
State Key Laboratory of General Artificial Intelligence, BIGAI