π€ AI Summary
To address the challenge of tracing text generated by large language models (LLMs), this paper proposes SimMarkβa post-hoc watermarking algorithm that requires no access to model internals (e.g., logits) and is compatible with API-only black-box LLMs. Methodologically, SimMark leverages semantic sentence embeddings (e.g., Sentence-BERT) and cosine similarity, integrating rejection sampling to embed statistically detectable yet human-imperceptible patterns. It introduces a novel soft-counting mechanism based on sentence-level semantic similarity, markedly enhancing robustness against paraphrasing attacks. Experiments across diverse domains demonstrate that SimMark consistently outperforms existing sentence-level watermarking methods: it achieves higher detection accuracy, superior resistance to paraphrasing, improved sampling efficiency, and preserves text quality without degradation. SimMark thus establishes a new benchmark for reliable provenance tracking of LLM-generated content.
π Abstract
The rapid proliferation of large language models (LLMs) has created an urgent need for reliable methods to detect whether a text is generated by such models. In this paper, we propose SimMark, a posthoc watermarking algorithm that makes LLMs' outputs traceable without requiring access to the model's internal logits, enabling compatibility with a wide range of LLMs, including API-only models. By leveraging the similarity of semantic sentence embeddings and rejection sampling to impose detectable statistical patterns imperceptible to humans, and employing a soft counting mechanism, SimMark achieves robustness against paraphrasing attacks. Experimental results demonstrate that SimMark sets a new benchmark for robust watermarking of LLM-generated content, surpassing prior sentence-level watermarking techniques in robustness, sampling efficiency, and applicability across diverse domains, all while preserving the text quality.