The Bias is in the Details: An Assessment of Cognitive Bias in LLMs

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This study systematically evaluates 45 large language models (LLMs) across eight classic cognitive biases—including anchoring, availability, and confirmation bias. We introduce the first scalable, reproducible, mechanism-oriented evaluation framework, built upon a psychologist-curated dataset of 220 decision-making scenarios (yielding 2.8 million model responses). The framework integrates multiple-choice judgment tasks, human-designed prompt templates with automated augmentation, and controlled-variable prompting techniques. Results show that LLMs exhibit statistically significant bias consistency in 17.8%–57.3% of cases. Scaling model parameters beyond 32B reduces bias in 39.5% of scenarios, whereas increasing prompt specificity yields at most a 14.9% reduction. This work establishes the first rigorous, large-scale assessment of cognitive biases in LLMs, offering both theoretical foundations and methodological tools for developing trustworthy AI decision-making systems.

Technology Category

Application Category

📝 Abstract

As Large Language Models (LLMs) are increasingly embedded in real-world decision-making processes, it becomes crucial to examine the extent to which they exhibit cognitive biases. Extensively studied in the field of psychology, cognitive biases appear as systematic distortions commonly observed in human judgments. This paper presents a large-scale evaluation of eight well-established cognitive biases across 45 LLMs, analyzing over 2.8 million LLM responses generated through controlled prompt variations. To achieve this, we introduce a novel evaluation framework based on multiple-choice tasks, hand-curate a dataset of 220 decision scenarios targeting fundamental cognitive biases in collaboration with psychologists, and propose a scalable approach for generating diverse prompts from human-authored scenario templates. Our analysis shows that LLMs exhibit bias-consistent behavior in 17.8-57.3% of instances across a range of judgment and decision-making contexts targeting anchoring, availability, confirmation, framing, interpretation, overattribution, prospect theory, and representativeness biases. We find that both model size and prompt specificity play a significant role on bias susceptibility as follows: larger size (>32B parameters) can reduce bias in 39.5% of cases, while higher prompt detail reduces most biases by up to 14.9%, except in one case (Overattribution), which is exacerbated by up to 8.8%.

Problem

Research questions and friction points this paper is trying to address.

Assessing cognitive bias susceptibility in 45 LLMs

Evaluating eight cognitive biases across 2.8 million responses

Analyzing impact of model size and prompt specificity on biases

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel evaluation framework using multiple-choice tasks

Hand-curated dataset with 220 decision scenarios

Scalable prompt generation from human-authored templates

🔎 Similar Papers

No similar papers found.