Who's Your Judge? On the Detectability of LLM-Generated Judgments

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

To address the challenge of distinguishing LLM-generated judgments from human judgments in sensitive domains such as academic peer review, this paper formally introduces the task of *text-free LLM judgment detection*—identifying the origin (human vs. LLM) of a judgment solely from its score and associated candidate content. To overcome the limitations of conventional text-based detection methods—which fail when no textual rationale is available—we propose J-Detector, a lightweight and interpretable detector. J-Detector jointly models score-content interactions by integrating explicit linguistic features with LLM-enhanced features, thereby quantifying inherent LLM scoring biases. Extensive experiments across diverse, multi-source datasets demonstrate that J-Detector significantly outperforms existing baselines in accuracy while providing transparent, feature-level interpretability. These results validate its practical utility and generalizability in real-world peer review settings.

Technology Category

Application Category

📝 Abstract

Large Language Model (LLM)-based judgments leverage powerful LLMs to efficiently evaluate candidate content and provide judgment scores. However, the inherent biases and vulnerabilities of LLM-generated judgments raise concerns, underscoring the urgent need for distinguishing them in sensitive scenarios like academic peer reviewing. In this work, we propose and formalize the task of judgment detection and systematically investigate the detectability of LLM-generated judgments. Unlike LLM-generated text detection, judgment detection relies solely on judgment scores and candidates, reflecting real-world scenarios where textual feedback is often unavailable in the detection process. Our preliminary analysis shows that existing LLM-generated text detection methods perform poorly given their incapability to capture the interaction between judgment scores and candidate content -- an aspect crucial for effective judgment detection. Inspired by this, we introduce extit{J-Detector}, a lightweight and transparent neural detector augmented with explicitly extracted linguistic and LLM-enhanced features to link LLM judges' biases with candidates' properties for accurate detection. Experiments across diverse datasets demonstrate the effectiveness of extit{J-Detector} and show how its interpretability enables quantifying biases in LLM judges. Finally, we analyze key factors affecting the detectability of LLM-generated judgments and validate the practical utility of judgment detection in real-world scenarios.

Problem

Research questions and friction points this paper is trying to address.

Detecting LLM-generated judgment scores in sensitive applications

Addressing biases in automated evaluation systems using neural networks

Identifying LLM judge vulnerabilities through interpretable detection methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight neural detector with explicit feature extraction

Linking judge biases to candidate properties for detection

Interpretable model quantifying biases in LLM judges

🔎 Similar Papers

Objection Overruled! Lay People can Distinguish Large Language Models from Lawyers, but still Favour Advice from an LLM