Anchored Decoding: Provably Reducing Copyright Risk for Any Language Model

📅 2026-02-06

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This work addresses the legal and compliance risks posed by language models reproducing copyrighted content during text generation. The authors propose Anchored Decoding, a decoding-time method that constrains generation to stay close to the output of a safe reference model without modifying the original model, thereby reducing verbatim copying. The approach supports user-defined information budgets to balance utility and safety, provides the first sequence-level guarantees for copyright risk control across arbitrary language models, and introduces a byte-level fusion mechanism enabling collaboration between models with disjoint token vocabularies. The authors also release TinyComma 1.8B, an open-source safe reference model. Experiments across six model families show that Anchored Decoding eliminates up to 75% of measurable replication gaps with minimal inference overhead, while preserving text fluency and factual consistency, establishing a new Pareto frontier in the trade-off between risk mitigation and generation quality.

Technology Category

Application Category

📝 Abstract

Modern language models (LMs) tend to memorize portions of their training data and emit verbatim spans. When the underlying sources are sensitive or copyright-protected, such reproduction raises issues of consent and compensation for creators and compliance risks for developers. We propose Anchored Decoding, a plug-and-play inference-time method for suppressing verbatim copying: it enables decoding from any risky LM trained on mixed-license data by keeping generation in bounded proximity to a permissively trained safe LM. Anchored Decoding adaptively allocates a user-chosen information budget over the generation trajectory and enforces per-step constraints that yield a sequence-level guarantee, enabling a tunable risk-utility trade-off. To make Anchored Decoding practically useful, we introduce a new permissively trained safe model (TinyComma 1.8B), as well as Anchored$_{\mathrm{Byte}}$ Decoding, a byte-level variant of our method that enables cross-vocabulary fusion via the ByteSampler framework (Hayase et al., 2025). We evaluate our methods across six model pairs on long-form evaluations of copyright risk and utility. Anchored and Anchored$_{\mathrm{Byte}}$ Decoding define a new Pareto frontier, preserving near-original fluency and factuality while eliminating up to 75% of the measurable copying gap (averaged over six copying metrics) between the risky baseline and a safe reference, at a modest inference overhead.

Problem

Research questions and friction points this paper is trying to address.

verbatim copying

language models

compliance

training data memorization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Anchored Decoding

inference-time control

safe language modeling

byte-level generation