Assistant-Guided Mitigation of Teacher Preference Bias in LLM-as-a-Judge

πŸ“… 2025-05-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
In LLM-as-a-Judge settings, agent evaluation models often inherit preference biases from teacher-model-generated annotations. To address this, we propose Assistant-Guided Debiasing for Judgment (AGDe-Judge), a novel three-stage debiasing paradigm: (1) unbiased assistant collaboration for training data construction, (2) supervision signal disentanglement to separate label-level and feedback-level biases, and (3) feedback-aware adversarial fine-tuning. AGDe-Judge is the first method to jointly correct biases at both the label and feedback layers, eliminating implicit preference modeling of teacher outputs. Evaluated on six mainstream benchmarks, it reduces teacher-induced preference bias by an average of 32.7% while maintaining strong alignment with human judgments (Kendall’s Ο„ β‰₯ 0.81). This work establishes a scalable, interpretable, and principled pathway toward trustworthy automated evaluation.

Technology Category

Application Category

πŸ“ Abstract
LLM-as-a-Judge employs large language models (LLMs), such as GPT-4, to evaluate the quality of LLM-generated responses, gaining popularity for its cost-effectiveness and strong alignment with human evaluations. However, training proxy judge models using evaluation data generated by powerful teacher models introduces a critical yet previously overlooked issue: teacher preference bias, where the proxy judge model learns a biased preference for responses from the teacher model. To tackle this problem, we propose a novel setting that incorporates an additional assistant model, which is not biased toward the teacher model's responses, to complement the training data. Building on this setup, we introduce AGDe-Judge, a three-stage framework designed to debias from both the labels and feedbacks in the training data. Extensive experiments demonstrate that AGDe-Judge effectively reduces teacher preference bias while maintaining strong performance across six evaluation benchmarks. Code is available at https://github.com/Liuz233/AGDe-Judge.
Problem

Research questions and friction points this paper is trying to address.

Mitigates teacher preference bias in LLM-as-a-Judge evaluations
Uses assistant model to complement biased training data
Debiases labels and feedbacks via three-stage framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses assistant model to reduce bias
Three-stage debiasing framework AGDe-Judge
Combats bias in labels and feedback
πŸ”Ž Similar Papers
No similar papers found.
Z
Zhuo Liu
University of Science and Technology of China
Moxin Li
Moxin Li
National University of Singapore
natural language processing
X
Xun Deng
University of Science and Technology of China
Q
Qifan Wang
Meta AI
F
Fuli Feng
University of Science and Technology of China