Assistant-Guided Mitigation of Teacher Preference Bias in LLM-as-a-Judge

📅 2025-05-25

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

In LLM-as-a-Judge settings, agent evaluation models often inherit preference biases from teacher-model-generated annotations. To address this, we propose Assistant-Guided Debiasing for Judgment (AGDe-Judge), a novel three-stage debiasing paradigm: (1) unbiased assistant collaboration for training data construction, (2) supervision signal disentanglement to separate label-level and feedback-level biases, and (3) feedback-aware adversarial fine-tuning. AGDe-Judge is the first method to jointly correct biases at both the label and feedback layers, eliminating implicit preference modeling of teacher outputs. Evaluated on six mainstream benchmarks, it reduces teacher-induced preference bias by an average of 32.7% while maintaining strong alignment with human judgments (Kendall’s τ ≥ 0.81). This work establishes a scalable, interpretable, and principled pathway toward trustworthy automated evaluation.

Technology Category

Application Category

📝 Abstract

LLM-as-a-Judge employs large language models (LLMs), such as GPT-4, to evaluate the quality of LLM-generated responses, gaining popularity for its cost-effectiveness and strong alignment with human evaluations. However, training proxy judge models using evaluation data generated by powerful teacher models introduces a critical yet previously overlooked issue: teacher preference bias, where the proxy judge model learns a biased preference for responses from the teacher model. To tackle this problem, we propose a novel setting that incorporates an additional assistant model, which is not biased toward the teacher model's responses, to complement the training data. Building on this setup, we introduce AGDe-Judge, a three-stage framework designed to debias from both the labels and feedbacks in the training data. Extensive experiments demonstrate that AGDe-Judge effectively reduces teacher preference bias while maintaining strong performance across six evaluation benchmarks. Code is available at https://github.com/Liuz233/AGDe-Judge.

Problem

Research questions and friction points this paper is trying to address.

Mitigates teacher preference bias in LLM-as-a-Judge evaluations

Uses assistant model to complement biased training data

Debiases labels and feedbacks via three-stage framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses assistant model to reduce bias

Three-stage debiasing framework AGDe-Judge

Combats bias in labels and feedback

🔎 Similar Papers

No similar papers found.