🤖 AI Summary
This work identifies a critical, previously overlooked issue in LLM watermark detection: during human–LLM dialogues, both humans and non-watermarked LLMs unconsciously mimic the statistical properties—including watermark signals—of prior generated text, leading to substantially increased false-positive rates and degraded long-term robustness. We introduce the concept of “mimicry,” challenging the foundational assumption that watermark signals are exclusively carried by the generator. Through controlled dialogue experiments, statistical modeling, and comparative analysis of human and model outputs, we empirically demonstrate that cross-subject mimicry induces high false positives across diverse settings. Our key contribution is twofold: (1) advocating dual-axis improvements for sustained watermark efficacy—specifically, drastically reducing false positives, and (2) adopting longer n-gram sequences as watermark seeds. These findings provide essential theoretical corrections and concrete design principles for next-generation reliable watermarking mechanisms.
📝 Abstract
Recent advancements in Large Language Models (LLMs) raised concerns over potential misuse, such as for spreading misinformation. In response two counter measures emerged: machine learning-based detectors that predict if text is synthetic, and LLM watermarking, which subtly marks generated text for identification and attribution. Meanwhile, humans are known to adjust language to their conversational partners both syntactically and lexically. By implication, it is possible that humans or unwatermarked LLMs could unintentionally mimic properties of LLM generated text, making counter measures unreliable. In this work we investigate the extent to which such conversational adaptation happens. We call the concept $ extit{mimicry}$ and demonstrate that both humans and LLMs end up mimicking, including the watermarking signal even in seemingly improbable settings. This challenges current academic assumptions and suggests that for long-term watermarking to be reliable, the likelihood of false positives needs to be significantly lower, while longer word sequences should be used for seeding watermarking mechanisms.