Systematic Bias in Large Language Models: Discrepant Response Patterns in Binary vs. Continuous Judgment Tasks

📅 2025-04-28

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This study identifies a systematic negative judgment bias in large language models (LLMs) induced by response format—binary versus continuous—challenging the implicit assumption that model outputs depend solely on input. Method: Through controlled experiments across multiple open-source and commercial LLMs using rigorous prompt engineering, we evaluate this effect on value judgments and text sentiment analysis. Contribution/Results: Binary-format responses exhibit significantly higher negative classification rates than continuous formats—by 12.3%–18.7% on average—with high consistency across models and tasks. This is the first empirical demonstration that task framing alone can introduce reproducible, systematic bias in LLM outputs. The findings establish response format as a critical, often overlooked design variable in LLM-based decision-making applications, particularly in high-stakes domains such as psychological text analysis, where reliability and calibration are essential.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly used in tasks such as psychological text analysis and decision-making in automated workflows. However, their reliability remains a concern due to potential biases inherited from their training process. In this study, we examine how different response format: binary versus continuous, may systematically influence LLMs' judgments. In a value statement judgments task and a text sentiment analysis task, we prompted LLMs to simulate human responses and tested both formats across several models, including both open-source and commercial models. Our findings revealed a consistent negative bias: LLMs were more likely to deliver"negative"judgments in binary formats compared to continuous ones. Control experiments further revealed that this pattern holds across both tasks. Our results highlight the importance of considering response format when applying LLMs to decision tasks, as small changes in task design can introduce systematic biases.

Problem

Research questions and friction points this paper is trying to address.

Examines bias in LLMs between binary and continuous response formats

Identifies consistent negative bias in binary judgments across tasks

Highlights impact of response format on LLM decision reliability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Examined binary vs continuous response formats

Revealed consistent negative bias in binary judgments

Highlighted response format impact on LLM biases

🔎 Similar Papers

A Theory of LLM Sampling: Part Descriptive and Part Prescriptive