LLMs Cannot Reliably Judge (Yet?): A Comprehensive Assessment on the Robustness of LLM-as-a-Judge

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the lack of robustness evaluation for LLM-as-a-Judge systems under adversarial attacks, proposing RobustJudge—the first unified, automated, and scalable robustness assessment framework. Methodologically, it integrates diverse adversarial attacks (e.g., Combined Attack, PAIR), defense strategies (e.g., re-tokenization, LLM-based detectors), prompt template optimization, and cross-model comparative experiments to enable end-to-end quantitative analysis. Key contributions include: (1) the first empirical demonstration that prompt templates and judge model selection critically determine robustness; (2) experimental validation that mainstream LLM judges remain vulnerable to manipulation, while optimized prompts significantly enhance attack resistance; (3) open-sourcing of JudgeLM-13B, a high-performance judge model; and (4) discovery of previously undisclosed security vulnerabilities in real-world platforms, including Alibaba’s PAI.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have demonstrated remarkable intelligence across various tasks, which has inspired the development and widespread adoption of LLM-as-a-Judge systems for automated model testing, such as red teaming and benchmarking. However, these systems are susceptible to adversarial attacks that can manipulate evaluation outcomes, raising concerns about their robustness and, consequently, their trustworthiness. Existing evaluation methods adopted by LLM-based judges are often piecemeal and lack a unified framework for comprehensive assessment. Furthermore, prompt template and model selections for improving judge robustness have been rarely explored, and their performance in real-world settings remains largely unverified. To address these gaps, we introduce RobustJudge, a fully automated and scalable framework designed to systematically evaluate the robustness of LLM-as-a-Judge systems. RobustJudge investigates the impact of attack methods and defense strategies (RQ1), explores the influence of prompt template and model selection (RQ2), and assesses the robustness of real-world LLM-as-a-Judge applications (RQ3).Our main findings are: (1) LLM-as-a-Judge systems are still vulnerable to a range of adversarial attacks, including Combined Attack and PAIR, while defense mechanisms such as Re-tokenization and LLM-based Detectors offer improved protection; (2) Robustness is highly sensitive to the choice of prompt template and judge models. Our proposed prompt template optimization method can improve robustness, and JudgeLM-13B demonstrates strong performance as a robust open-source judge; (3) Applying RobustJudge to Alibaba's PAI platform reveals previously unreported vulnerabilities. The source code of RobustJudge is provided at https://github.com/S3IC-Lab/RobustJudge.
Problem

Research questions and friction points this paper is trying to address.

Assessing LLM-as-a-Judge robustness against adversarial attacks
Exploring prompt template and model selection for judge reliability
Evaluating real-world vulnerabilities in LLM-as-a-Judge applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated framework RobustJudge evaluates LLM-as-a-Judge robustness
Investigates attack methods, defense strategies, and prompt templates
Optimizes prompt templates and selects robust judge models
🔎 Similar Papers
No similar papers found.
S
Songze Li
Southeast University, China
C
Chuokun Xu
Southeast University, China
J
Jiaying Wang
Southeast University, China
Xueluan Gong
Xueluan Gong
Nanyang Technological University
Computer science
C
Chen Chen
Nanyang Technological University, Singapore
J
Jirui Zhang
Southeast University, China
J
Jun Wang
OPPO Research Institute, China
Kwok-Yan Lam
Kwok-Yan Lam
Nanyang Technological University
CybersecurityPrivacy-Preserving technologiesDigital TrustDistributing systemsLegalTech
Shouling Ji
Shouling Ji
Professor, Zhejiang University & Georgia Institute of Technology
Data-driven SecurityAI SecuritySoftware ScurityPrivacy