🤖 AI Summary
To address copyright protection needs for large language model (LLM)-generated text, this paper conducts a systematic literature review to establish the first comprehensive taxonomy for LLM text watermarking. Departing from conventional technique-centric taxonomies, we propose an original “intention-driven” three-dimensional framework encompassing design objectives (e.g., verifiability, robustness), dataset characteristics, and embedding/removal methodologies—while uniquely integrating evaluation benchmarks, application scenarios, and technical limitations. Our analysis uncovers critical gaps in dynamic content adaptation, cross-model generalizability, preservation of human readability, and standardized evaluation. The taxonomy unifies fragmented research efforts and provides a coherent theoretical foundation alongside an extensible research roadmap, thereby advancing foundational support for authorship rights protection in generative text.
📝 Abstract
With the rapid growth of Large Language Models (LLMs), safeguarding textual content against unauthorized use is crucial. Text watermarking offers a vital solution, protecting both - LLM-generated and plain text sources. This paper presents a unified overview of different perspectives behind designing watermarking techniques, through a comprehensive survey of the research literature. Our work has two key advantages, (1) we analyze research based on the specific intentions behind different watermarking techniques, evaluation datasets used, watermarking addition, and removal methods to construct a cohesive taxonomy. (2) We highlight the gaps and open challenges in text watermarking to promote research in protecting text authorship. This extensive coverage and detailed analysis sets our work apart, offering valuable insights into the evolving landscape of text watermarking in language models.