FairJudge: An Adaptive, Debiased, and Consistent LLM-as-a-Judge

📅 2026-02-06

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the susceptibility of existing large language models (LLMs) as evaluators to non-semantic cues—such as response position, length, and formatting—which leads to poor adaptability and inconsistency across evaluation settings. To mitigate these issues, the authors propose modeling the judging behavior as a learnable and regularized policy. They construct a high-information-density evaluation dataset and introduce a curriculum-based, multi-stage alignment framework that integrates supervised fine-tuning (SFT), direct preference optimization (DPO), and group relative policy optimization (GRPO). This co-optimized approach effectively suppresses non-semantic biases and enhances cross-modal consistency. Experimental results demonstrate that the proposed method significantly outperforms larger instruction-tuned models on multiple internal and external benchmarks, achieving substantial improvements in both judging consistency—measured by F1 score and agreement rate—and debiasing capability.

Technology Category

Application Category

📝 Abstract

Existing LLM-as-a-Judge systems suffer from three fundamental limitations: limited adaptivity to task- and domain-specific evaluation criteria, systematic biases driven by non-semantic cues such as position, length, format, and model provenance, and evaluation inconsistency that leads to contradictory judgments across different evaluation modes (e.g., pointwise versus pairwise). To address these issues, we propose FairJudge, an adaptive, debiased, and consistent LLM-as-a-Judge. Unlike prior approaches that treat the judge as a static evaluator, FairJudge models judging behavior itself as a learnable and regularized policy. From a data-centric perspective, we construct a high-information-density judging dataset that explicitly injects supervision signals aligned with evaluation behavior. Building on this dataset, we adopt a curriculum-style SFT-DPO-GRPO training paradigm that progressively aligns rubric adherence, bias mitigation, and cross-mode consistency, while avoiding catastrophic forgetting. Experimental results on multiple internal and public benchmarks show that FairJudge consistently improves agreement and F1, reduces non-semantic biases, and outperforms substantially larger instruction-tuned LLMs. All resources will be publicly released after acceptance to facilitate future research.

Problem

Research questions and friction points this paper is trying to address.

LLM-as-a-Judge

evaluation bias

adaptivity

consistency

non-semantic cues

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-as-a-Judge

bias mitigation

evaluation consistency