Bias in the Loop: How Humans Evaluate AI-Generated Suggestions

📅 2025-09-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Human-AI collaborative decision-making (e.g., medical diagnosis, content moderation) is increasingly prevalent, yet AI recommendations often induce cognitive biases that impair collaborative efficacy. Method: We conducted a randomized controlled experiment integrating behavioral data analysis and multi-dimensional performance evaluation to systematically examine how task design—specifically recommendation quality, cognitive load, and incentive structures—and individual traits—particularly attitudes toward AI—affect collaborative decision outcomes. Contribution/Results: Contrary to conventional assumptions, mandatory error-correction instructions reduced engagement and increased erroneous acceptance of AI suggestions. Crucially, individual AI trust emerged as the strongest psychological predictor of decision accuracy: skeptics exhibited greater deliberation and higher accuracy, whereas over-trusting users were more prone to uncritically accepting incorrect recommendations. This study challenges the “automation-as-enhancement” heuristic and provides the first empirical evidence of attitude-driven cognitive mechanisms in human-AI collaboration, offering both theoretical grounding and a design framework for psychologically informed human-AI workflow optimization.

Technology Category

Application Category

📝 Abstract
Human-AI collaboration increasingly drives decision-making across industries, from medical diagnosis to content moderation. While AI systems promise efficiency gains by providing automated suggestions for human review, these workflows can trigger cognitive biases that degrade performance. We know little about the psychological factors that determine when these collaborations succeed or fail. We conducted a randomized experiment with 2,784 participants to examine how task design and individual characteristics shape human responses to AI-generated suggestions. Using a controlled annotation task, we manipulated three factors: AI suggestion quality in the first three instances, task burden through required corrections, and performance-based financial incentives. We collected demographics, attitudes toward AI, and behavioral data to assess four performance metrics: accuracy, correction activity, overcorrection, and undercorrection. Two patterns emerged that challenge conventional assumptions about human-AI collaboration. First, requiring corrections for flagged AI errors reduced engagement and increased the tendency to accept incorrect suggestions, demonstrating how cognitive shortcuts influence collaborative outcomes. Second, individual attitudes toward AI emerged as the strongest predictor of performance, surpassing demographic factors. Participants skeptical of AI detected errors more reliably and achieved higher accuracy, while those favorable toward automation exhibited dangerous overreliance on algorithmic suggestions. The findings reveal that successful human-AI collaboration depends not only on algorithmic performance but also on who reviews AI outputs and how review processes are structured. Effective human-AI collaborations require consideration of human psychology: selecting diverse evaluator samples, measuring attitudes, and designing workflows that counteract cognitive biases.
Problem

Research questions and friction points this paper is trying to address.

Examining cognitive biases in human-AI collaborative decision-making workflows
Investigating how task design and individual traits affect AI suggestion responses
Identifying psychological factors determining success or failure of human-AI collaboration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Randomized experiment with 2784 participants
Manipulated AI quality, task burden, incentives
Measured attitudes, demographics, and behavioral metrics
J
Jacob Beck
LMU Munich, Department of Statistics; Munich Center for Machine Learning
Stephanie Eckman
Stephanie Eckman
Amazon
Data CollectionData QualityPassive DataMotivated Misreporting in Surveys
C
Christoph Kern
LMU Munich, Department of Statistics; Munich Center for Machine Learning; University of Maryland, Joint Program in Survey Methodology
Frauke Kreuter
Frauke Kreuter
Professor of Survey Methodology, University of Maryland
NonresponseInterviewerParadataMeasurement Error