Exploring the Effects of Alignment on Numerical Bias in Large Language Models

📅 2026-01-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the causal relationship between alignment training—such as instruction tuning and preference tuning—and numerical bias in large language models (LLMs) when used as evaluators (LLM-as-a-judge), a phenomenon where models exhibit a tendency to favor specific score values, thereby compromising evaluation reliability. Through comparative analysis of model outputs before and after alignment, the authors conduct mitigation experiments employing strategies including temperature scaling, distribution calibration, and score range adjustment. Their findings reveal that alignment significantly exacerbates numerical bias, with score range adjustment emerging as the most effective intervention: it not only substantially reduces bias but also enhances overall evaluation performance, despite its heuristic nature. This work provides both empirical insights and practical solutions for understanding and mitigating bias in LLM-based evaluation.

Technology Category

Application Category

📝 Abstract
"LLM-as-a-judge,"which utilizes large language models (LLMs) as evaluators, has proven effective in many evaluation tasks. However, evaluator LLMs exhibit numerical bias, a phenomenon where certain evaluation scores are generated disproportionately often, leading reduced evaluation performance. This study investigates the cause of this bias. Given that most evaluator LLMs are aligned through instruction tuning and preference tuning, and that prior research suggests alignment reduces output diversity, we hypothesize that numerical bias arises from alignment. To test this, we compare outputs from pre- and post-alignment LLMs, and observe that alignment indeed increases numerical bias. We also explore mitigation strategies for post-alignment LLMs, including temperature scaling, distribution calibration, and score range adjustment. Among these, score range adjustment is most effective in reducing bias and improving performance, though still heuristic. Our findings highlight the need for further work on optimal score range selection and more robust mitigation strategies.
Problem

Research questions and friction points this paper is trying to address.

numerical bias
large language models
LLM-as-a-judge
alignment
evaluation bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

numerical bias
LLM-as-a-judge
alignment
score range adjustment
distribution calibration
🔎 Similar Papers
No similar papers found.
A
Ayako Sato
Tokyo Metropolitan University, Hitotsubashi University, CyberAgent Inc.
H
Hwichan Kim
Tokyo Metropolitan University, Hitotsubashi University
Z
Zhousi Chen
Tokyo Metropolitan University, Hitotsubashi University
Masato Mita
Masato Mita
Recruit Co.,Ltd.
Natural Language ProcessingComputational Psycholinguistics
Mamoru Komachi
Mamoru Komachi
Professor at Hitotsubashi University
Computational LinguisticsNatural Language ProcessingMachine LearningDeep Learning