🤖 AI Summary
This study systematically evaluates the security boundaries of large language models (LLMs) employed as rerankers, focusing on their vulnerability to prompt injection attacks that can maliciously manipulate ranking outcomes. It examines a range of model architectures—including encoder-decoder and decoder-only variants—and three ranking paradigms: pairwise, listwise, and setwise. By measuring attack success rate (ASR) and nDCG@10, the work analyzes two injection strategies—decision objective manipulation and judgment criterion hijacking—and reveals how model family, architecture, and deployment configuration critically influence susceptibility. Notably, encoder-decoder architectures demonstrate inherent robustness against such attacks, offering empirical guidance for designing secure LLM-based ranking systems. The authors publicly release their code and data to support further research.
📝 Abstract
Large Language Models (LLMs) have emerged as powerful re-rankers. Recent research has however showed that simple prompt injections embedded within a candidate document (i.e., jailbreak prompt attacks) can significantly alter an LLM's ranking decisions. While this poses serious security risks to LLM-based ranking pipelines, the extent to which this vulnerability persists across diverse LLM families, architectures, and settings remains largely under-explored. In this paper, we present a comprehensive empirical study of jailbreak prompt attacks against LLM rankers. We focus our evaluation on two complementary tasks: (1) Preference Vulnerability Assessment, measuring intrinsic susceptibility via attack success rate (ASR); and (2) Ranking Vulnerability Assessment, quantifying the operational impact on the ranking's quality (nDCG@10). We systematically examine three prevalent ranking paradigms (pairwise, listwise, setwise) under two injection variants: decision objective hijacking and decision criteria hijacking. Beyond reproducing prior findings, we expand the analysis to cover vulnerability scaling across model families, position sensitivity, backbone architectures, and cross-domain robustness. Our results characterize the boundary conditions of these vulnerabilities, revealing critical insights such as that encoder-decoder architectures exhibit strong inherent resilience to jailbreak attacks. We publicly release our code and additional experimental results at https://github.com/ielab/LLM-Ranker-Attack.