Label Effects: Shared Heuristic Reliance in Trust Assessment by Humans and LLM-as-a-Judge

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This study investigates the reliability of large language models (LLMs) as judges and demonstrates that their trustworthiness assessments are systematically biased by source labels. Through counterfactual experiments integrating eye-tracking, attention analysis, and logits-based uncertainty measures, the work reveals—for the first time—a shared heuristic mechanism: both humans and LLMs disproportionately trust content labeled as “human-generated,” regardless of its actual origin. The LLM allocates greater attention to the label region during judgment and exhibits higher decision uncertainty when presented with “AI-generated” labels. By linking behavioral outcomes to underlying cognitive and internal model mechanisms, this research provides an empirical foundation for understanding and calibrating the reliability of LLMs as evaluative agents.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are increasingly used as automated evaluators (LLM-as-a-Judge). This work challenges its reliability by showing that trust judgments by LLMs are biased by disclosed source labels. Using a counterfactual design, we find that both humans and LLM judges assign higher trust to information labeled as human-authored than to the same content labeled as AI-generated. Eye-tracking data reveal that humans rely heavily on source labels as heuristic cues for judgments. We analyze LLM internal states during judgment. Across label conditions, models allocate denser attention to the label region than the content region, and this label dominance is stronger under Human labels than AI labels, consistent with the human gaze patterns. Besides, decision uncertainty measured by logits is higher under AI labels than Human labels. These results indicate that the source label is a salient heuristic cue for both humans and LLMs. It raises validity concerns for label-sensitive LLM-as-a-Judge evaluation, and we cautiously raise that aligning models with human preferences may propagate human heuristic reliance into models, motivating debiased evaluation and alignment.

Problem

Research questions and friction points this paper is trying to address.

label effects

trust assessment

LLM-as-a-Judge

heuristic reliance

source bias

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-as-a-Judge

source label bias

heuristic reliance