Do Language Models Mirror Human Confidence? Exploring Psychological Insights to Address Overconfidence in LLMs

📅 2025-05-31

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work investigates confidence miscalibration in large language models (LLMs) on question-answering tasks: specifically, whether LLMs exhibit human-like difficulty sensitivity—underconfidence on easy items and overconfidence on hard ones—and whether socially grounded identity cues (e.g., expert vs. layperson, race, gender, age) induce systematic, accuracy-irrelevant confidence biases. To isolate confidence estimation from answer generation, we propose Answer-Free Confidence Estimation (AFCE), a two-stage prompting framework. Evaluating Llama-3-70B, Claude-3-Sonnet, and GPT-4o on MMLU and GPQA benchmarks, we find that LLMs’ confidence is largely insensitive to item difficulty and significantly distorted by identity prompts. AFCE achieves the first consistent calibration improvement across models and tasks: reducing mean calibration error by 38%, enhancing difficulty sensitivity, and aligning confidence distributions more closely with empirically observed human cognitive patterns.

Technology Category

Application Category

📝 Abstract

Psychology research has shown that humans are poor at estimating their performance on tasks, tending towards underconfidence on easy tasks and overconfidence on difficult tasks. We examine three LLMs, Llama-3-70B-instruct, Claude-3-Sonnet, and GPT-4o, on a range of QA tasks of varying difficulty, and show that models exhibit subtle differences from human patterns of overconfidence: less sensitive to task difficulty, and when prompted to answer based on different personas -- e.g., expert vs layman, or different race, gender, and ages -- the models will respond with stereotypically biased confidence estimations even though their underlying answer accuracy remains the same. Based on these observations, we propose Answer-Free Confidence Estimation (AFCE) to improve confidence calibration and LLM interpretability in these settings. AFCE is a self-assessment method that employs two stages of prompting, first eliciting only confidence scores on questions, then asking separately for the answer. Experiments on the MMLU and GPQA datasets spanning subjects and difficulty show that this separation of tasks significantly reduces overconfidence and delivers more human-like sensitivity to task difficulty.

Problem

Research questions and friction points this paper is trying to address.

Examining LLMs' overconfidence patterns compared to humans

Addressing biased confidence estimations in persona-based responses

Proposing AFCE to improve confidence calibration in LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Answer-Free Confidence Estimation (AFCE) method

Two-stage prompting for confidence and answers

Reduces overconfidence and improves calibration

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Research Scientist, Interpretability

Anthropic

$350,000—$850,000 USD

San Francisco, CA, USA / remote (case-by-case basis)

Authors to Follow