🤖 AI Summary
The increasing use of large language models (LLMs) in academic peer review poses a critical challenge: reliably detecting LLM-generated text to safeguard review integrity. Existing detection methods lack robustness and practical deployability in real-world review settings.
Method: This paper introduces Topic-Based Watermarking (TBW), the first lightweight, semantics-aware, topic-driven watermarking method specifically designed for authentic peer review scenarios. TBW integrates LLM-derived text embeddings with a robust watermark detection framework, enabling verifiable provenance tracing of LLM-generated content without compromising review quality.
Contribution/Results: Evaluated on real conference review data, TBW maintains statistically insignificant degradation in review quality while achieving >92% detection accuracy against diverse paraphrasing attacks. Designed with security, practicality, and regulatory compliance in mind, TBW constitutes the first production-ready watermarking solution tailored to end-to-end academic peer review workflows.
📝 Abstract
Large language models (LLMs) are increasingly integrated into academic workflows, with many conferences and journals permitting their use for tasks such as language refinement and literature summarization. However, their use in peer review remains prohibited due to concerns around confidentiality breaches, hallucinated content, and inconsistent evaluations. As LLM-generated text becomes more indistinguishable from human writing, there is a growing need for reliable attribution mechanisms to preserve the integrity of the review process. In this work, we evaluate topic-based watermarking (TBW), a lightweight, semantic-aware technique designed to embed detectable signals into LLM-generated text. We conduct a comprehensive assessment across multiple LLM configurations, including base, few-shot, and fine-tuned variants, using authentic peer review data from academic conferences. Our results show that TBW maintains review quality relative to non-watermarked outputs, while demonstrating strong robustness to paraphrasing-based evasion. These findings highlight the viability of TBW as a minimally intrusive and practical solution for enforcing LLM usage in peer review.