🤖 AI Summary
This study systematically investigates the robustness of LLM-based scientific peer review systems—including both illicit and legitimate platforms—against PDF-level indirect prompt injection attacks, focusing on a novel security threat: malicious reversal of editorial decisions from “rejection” to “acceptance.”
Method: We propose WAVS (Weighted Adversarial Vulnerability Score), a new metric quantifying model susceptibility; construct the first academic-review-specific adversarial dataset comprising 200 papers; and design 15 domain-adapted, PDF-native indirect injection strategies—including semantic obfuscation and steganographic techniques.
Contribution/Results: We evaluate 13 mainstream models (e.g., GPT-5, Claude Haiku, DeepSeek), achieving high decision-reversal rates. All data and the injection framework are publicly released, establishing a benchmark resource and methodological foundation for AI-assisted peer review security research.
📝 Abstract
The landscape of scientific peer review is rapidly evolving with the integration of Large Language Models (LLMs). This shift is driven by two parallel trends: the widespread individual adoption of LLMs by reviewers to manage workload (the "Lazy Reviewer" hypothesis) and the formal institutional deployment of AI-powered assessment systems by conferences like AAAI and Stanford's Agents4Science. This study investigates the robustness of these "LLM-as-a-Judge" systems (both illicit and sanctioned) to adversarial PDF manipulation. Unlike general jailbreaks, we focus on a distinct incentive: flipping "Reject" decisions to "Accept," for which we develop a novel evaluation metric which we term as WAVS (Weighted Adversarial Vulnerability Score). We curated a dataset of 200 scientific papers and adapted 15 domain-specific attack strategies to this task, evaluating them across 13 Language Models, including GPT-5, Claude Haiku, and DeepSeek. Our results demonstrate that obfuscation strategies like "Maximum Mark Magyk" successfully manipulate scores, achieving alarming decision flip rates even in large-scale models. We will release our complete dataset and injection framework to facilitate more research on this topic.