Human Bias in the Face of AI: The Role of Human Judgement in AI Generated Text Evaluation

📅 2024-09-29
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Human evaluation of AI-generated text is subject to a substantial label-driven negative bias—distinct from actual quality differences and independent of textual merit. Method: Three double-blind, controlled experiments (text rewriting, news summarization, persuasive writing) systematically manipulated “AI-generated” versus “human-written” labels while holding content constant; preference ratings and behavioral responses were measured across conditions, including deliberate label misattribution. Results: Participants exhibited >30% lower preference for identically rated texts labeled as AI-generated; this bias persisted even under label misattribution, confirming that source attribution—not intrinsic quality—drives evaluation. These findings reveal a critical cognitive bottleneck in human-AI collaboration: evaluators rely predominantly on provenance cues rather than semantic or functional quality. This study provides the first cross-task, replicable behavioral evidence quantifying label bias in AI credibility assessment, establishing an empirical foundation for shifting AI evaluation paradigms toward content-centric, quality-based frameworks.

Technology Category

Application Category

📝 Abstract
As AI advances in text generation, human trust in AI generated content remains constrained by biases that go beyond concerns of accuracy. This study explores how bias shapes the perception of AI versus human generated content. Through three experiments involving text rephrasing, news article summarization, and persuasive writing, we investigated how human raters respond to labeled and unlabeled content. While the raters could not differentiate the two types of texts in the blind test, they overwhelmingly favored content labeled as"Human Generated,"over those labeled"AI Generated,"by a preference score of over 30%. We observed the same pattern even when the labels were deliberately swapped. This human bias against AI has broader societal and cognitive implications, as it undervalues AI performance. This study highlights the limitations of human judgment in interacting with AI and offers a foundation for improving human-AI collaboration, especially in creative fields.
Problem

Research questions and friction points this paper is trying to address.

Human bias affects perception of AI-generated text
Preference for human-labeled content despite equal quality
Undervaluing AI performance impacts human-AI collaboration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Blind test comparison of AI and human texts
Label swapping to measure human bias
Preference score analysis for AI content
T
Tiffany Zhu
The Harker School
I
Iain Xie Weissburg
University of California, Santa Barbara
Kexun Zhang
Kexun Zhang
Carnegie Mellon University
W
W. Wang
University of California, Santa Barbara