LLM-Text Watermarking based on Lagrange Interpolation

📅 2025-05-09

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

Existing text watermarking methods struggle to reliably attribute large language model (LLM)-generated content—especially under strong adversarial edits such as rewriting or aggressive truncation. To address this, we propose a novel text watermarking scheme based on Lagrange interpolation, the first to leverage this mathematical framework for textual watermarking. Our method embeds a sequence of collinear points into the text, enabling unique recovery of the author identifier from as few as three residual points. It integrates pseudorandom sequences (LFSR/NFSR), implicit text-level coordinate encoding, and a tamper-resilient decoding mechanism to significantly enhance robustness. Experiments demonstrate near-perfect author identification accuracy (>99%) even when 90% of the watermarked text is maliciously altered or deleted—substantially outperforming state-of-the-art approaches. This work provides an efficient, reliable solution for copyright protection and provenance tracing of LLM-generated content.

Technology Category

Application Category

📝 Abstract

The rapid advancement of LLMs (Large Language Models) has established them as a foundational technology for many AI and ML powered human computer interactions. A critical challenge in this context is the attribution of LLM-generated text, either to the specific language model used or to the individual user who generated it. This is essential for combating misinformation, fake news, misinterpretation, and plagiarism. One of the key techniques for addressing this issue is watermarking. This work presents a watermarking scheme for LLM-generated text based on Lagrange interpolation, which enables the recovery of a secret author identity even when the text has been heavily redacted by an adversary. The core idea is to embed a continuous sequence of points (x, f(x)) that lie on a single straight line. The x-coordinates are generated pseudorandomly using either an LFSR (when security is not a priority) or a cryptographically secure NFSR for high-security applications. The scheme efficiency and resilience to adversarial modifications are analysed. Experimental results show that the proposed method is highly effective, allowing the recovery of the author identity when as few as three points survive adversarial manipulation.

Problem

Research questions and friction points this paper is trying to address.

Attribution of LLM-generated text to specific models or users

Combating misinformation and plagiarism via watermarking

Recovering secret author identity from heavily redacted text

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Lagrange interpolation for watermarking

Embeds secret identity via straight line points

Supports LFSR or NFSR for coordinate generation

🔎 Similar Papers

From Intentions to Techniques: A Comprehensive Taxonomy and Challenges in Text Watermarking for Large Language Models