ChatGPT: Excellent Paper! Accept It. Editor: Imposter Found! Review Rejected

📅 2025-12-23

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Misuse of large language models (LLMs) in scientific writing and peer review is precipitating an academic trust crisis: generated content often lacks originality, fabricates results, and embeds implicit biases—potentially misleading downstream research and compromising safety-critical domains such as healthcare. This paper introduces the first bidirectional LLM adversarial framework tailored to academic review. On the attack side, it employs PDF steganography for implicit prompt injection to manipulate LLM review judgments. On the defense side, it innovatively repurposes prompt injection techniques as detection tools, establishing an “injection–detection” mechanism that identifies spurious reviews via trigger-word analysis. The method integrates steganographic embedding, behavioral pattern analysis, adversarial prompt engineering, and interpretable log auditing. Experiments across mainstream LLMs achieve >92% accuracy in detecting fabricated reviews. We further propose a deployable journal alert protocol, balancing technical feasibility with ethical safeguards.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) like ChatGPT are now widely used in writing and reviewing scientific papers. While this trend accelerates publication growth and reduces human workload, it also introduces serious risks. Papers written or reviewed by LLMs may lack real novelty, contain fabricated or biased results, or mislead downstream research that others depend on. Such issues can damage reputations, waste resources, and even endanger lives when flawed studies influence medical or safety-critical systems. This research explores both the offensive and defensive sides of this growing threat. On the attack side, we demonstrate how an author can inject hidden prompts inside a PDF that secretly guide or "jailbreak" LLM reviewers into giving overly positive feedback and biased acceptance. On the defense side, we propose an "inject-and-detect" strategy for editors, where invisible trigger prompts are embedded into papers; if a review repeats or reacts to these triggers, it reveals that the review was generated by an LLM, not a human. This method turns prompt injections from vulnerability into a verification tool. We outline our design, expected model behaviors, and ethical safeguards for deployment. The goal is to expose how fragile today's peer-review process becomes under LLM influence and how editorial awareness can help restore trust in scientific evaluation.

Problem

Research questions and friction points this paper is trying to address.

Detect LLM-generated reviews in scientific publishing

Expose hidden prompt injections that bias peer review

Restore trust in peer review by identifying automated feedback

Innovation

Methods, ideas, or system contributions that make the work stand out.

Inject hidden prompts to jailbreak LLM reviewers

Embed invisible triggers to detect LLM-generated reviews

Turn prompt injections into verification tool

🔎 Similar Papers

The Great AI Witch Hunt: Reviewers Perception and (Mis)Conception of Generative AI in Research Writing

2024-06-27Computers in Human BehaviorCitations: 19

Anthropic

$350,000—$500,000 USD

San Francisco, CA, USA

Authors to Follow