Exploring the Effects of Alignment on Numerical Bias in Large Language Models

📅 2026-01-23

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This study investigates the causal relationship between alignment training—such as instruction tuning and preference tuning—and numerical bias in large language models (LLMs) when used as evaluators (LLM-as-a-judge), a phenomenon where models exhibit a tendency to favor specific score values, thereby compromising evaluation reliability. Through comparative analysis of model outputs before and after alignment, the authors conduct mitigation experiments employing strategies including temperature scaling, distribution calibration, and score range adjustment. Their findings reveal that alignment significantly exacerbates numerical bias, with score range adjustment emerging as the most effective intervention: it not only substantially reduces bias but also enhances overall evaluation performance, despite its heuristic nature. This work provides both empirical insights and practical solutions for understanding and mitigating bias in LLM-based evaluation.

Technology Category

Application Category

📝 Abstract

"LLM-as-a-judge,"which utilizes large language models (LLMs) as evaluators, has proven effective in many evaluation tasks. However, evaluator LLMs exhibit numerical bias, a phenomenon where certain evaluation scores are generated disproportionately often, leading reduced evaluation performance. This study investigates the cause of this bias. Given that most evaluator LLMs are aligned through instruction tuning and preference tuning, and that prior research suggests alignment reduces output diversity, we hypothesize that numerical bias arises from alignment. To test this, we compare outputs from pre- and post-alignment LLMs, and observe that alignment indeed increases numerical bias. We also explore mitigation strategies for post-alignment LLMs, including temperature scaling, distribution calibration, and score range adjustment. Among these, score range adjustment is most effective in reducing bias and improving performance, though still heuristic. Our findings highlight the need for further work on optimal score range selection and more robust mitigation strategies.

Problem

Research questions and friction points this paper is trying to address.

numerical bias

large language models

LLM-as-a-judge

alignment

evaluation bias

Innovation

Methods, ideas, or system contributions that make the work stand out.

numerical bias

LLM-as-a-judge

alignment