Un-Attributability: Computing Novelty From Retrieval & Semantic Similarity

📅 2025-10-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of quantifying semantic novelty in language model outputs, proposing *unattributability*—the inability to semantically retrieve any pretraining corpus sample as the source of an output—as a formal, interpretable metric. Methodologically, it introduces a two-stage retrieval pipeline: first, efficient coarse-grained indexing using GIST embeddings; second, fine-grained re-ranking via ColBERTv2, with attribution thresholds calibrated against human-written text. The paper provides the first formal definition and empirical evaluation of unattributability. Experiments on the SmolLM family reveal three key findings: (1) instruction tuning significantly improves output unattributability; (2) increased reliance on longer contexts enhances semantic novelty; and (3) domain-specific characteristics shape the distribution of unattributable outputs. By grounding novelty assessment in semantic retrieval fidelity rather than surface-level heuristics, this work establishes a scalable, principled framework for evaluating generative originality—offering both interpretability and practical applicability for safety, copyright, and alignment research.

Technology Category

Application Category

📝 Abstract
Understanding how language-model outputs relate to the pretraining corpus is central to studying model behavior. Most training data attribution (TDA) methods ask which training examples causally influence a given output, often using leave-one-out tests. We invert the question: which outputs cannot be attributed to any pretraining example? We introduce un-attributability as an operational measure of semantic novelty: an output is novel if the pretraining corpus contains no semantically similar context. We approximate this with a simple two-stage retrieval pipeline: index the corpus with lightweight GIST embeddings, retrieve the top-n candidates, then rerank with ColBERTv2. If the nearest corpus item is less attributable than a human-generated text reference, we consider the output of the model as novel. We evaluate on SmolLM and SmolLM2 and report three findings: (1) models draw on pretraining data across much longer spans than previously reported; (2) some domains systematically promote or suppress novelty; and (3) instruction tuning not only alters style but also increases novelty. Reframing novelty assessment around un-attributability enables efficient analysis at pretraining scale. We release ~20 TB of corpus chunks and index artifacts to support replication and large-scale extension of our analysis at https://huggingface.co/datasets/stai-tuebingen/faiss-smollm
Problem

Research questions and friction points this paper is trying to address.

Measuring semantic novelty in language model outputs
Identifying unattributable outputs from pretraining corpus
Assessing novelty via retrieval and similarity methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Un-attributability measures semantic novelty via retrieval
Two-stage pipeline uses GIST embeddings and ColBERTv2 reranking
Compares model outputs to human text for novelty assessment
🔎 Similar Papers
No similar papers found.