R-CoT: A Reasoning-Layer Watermark via Redundant Chain-of-Thought in Large Language Models

πŸ“… 2026-04-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

171K/year
πŸ€– AI Summary
This work addresses the limited robustness of existing large language model watermarking methods under output perturbations or post-training modifications such as fine-tuning. To overcome this, the authors propose a reasoning-layer watermarking framework based on Redundant Chain-of-Thought (R-CoT), which embeds watermarks into the model’s internal reasoning pathways rather than its output distribution, thereby internalizing them as distinctive reasoning strategies. Leveraging a GRPO-based dual-trajectory optimization mechanism, the method concurrently constructs native and watermarked reasoning paths within a shared parameter space, enabling their synergistic coexistence. Experimental results demonstrate that the approach maintains a true positive watermark detection rate above 95% across diverse post-training scenarios, significantly outperforming current techniques in both effectiveness and robustness.
πŸ“ Abstract
Large language models (LLMs) are widely deployed in multiple scenarios due to reasoning capabilities. In order to prevent the models from being misused, watermarking is generally employed to ensure ownership. However, most existing watermarking methods rely on superficial modifications to the model's output distribution, rendering the watermark vulnerable to perturbation and removal. To overcome this challenge, this paper introduces a reasoning-layer framework termed Redundant Chain-of-Thought (R-CoT), which embeds watermarks into the reasoning path. A dual-trajectory optimization mechanism based on GRPO enables the native and the watermark reasoning path to coexist within a shared parameter space, internalizing the watermark as a distinct reasoning policy. Therefore, the watermark is embedded into the model's stable reasoning path, avoiding the watermark failure caused by output-level perturbations. Experimental results show that, compared with existing methods, R-CoT achieves high watermark effectiveness and strong robustness. Under fine-tuning and other post-training operations, the true positive rate (TPR) consistently remains above 95%, exhibiting only marginal degradation.
Problem

Research questions and friction points this paper is trying to address.

watermarking
large language models
reasoning path
robustness
output perturbation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Redundant Chain-of-Thought
reasoning-layer watermark
dual-trajectory optimization
GRPO
robust watermarking
πŸ”Ž Similar Papers