Every Language Model Has a Forgery-Resistant Signature

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work addresses the challenge of output attribution and forgery detection for large language models (LLMs). We propose a parameter-free, geometric authentication method based on intrinsic properties of output logits. We observe that logits from distinct LLMs naturally reside on characteristic ellipsoidal surfaces in high-dimensional space—a model-specific, self-contained, compact, and forgery-resistant “ellipsoidal signature.” Crucially, this signature is extractable solely from the output logits, requiring neither input text nor model weights, enabling both model provenance identification and output authenticity verification. Experiments across diverse small-scale models confirm the uniqueness and detectability of these signatures. Leveraging this property, we design a symmetric-key–like output authentication protocol. To our knowledge, this is the first work to transform geometric constraints inherent in LLM outputs into verifiable digital signatures, establishing a novel paradigm for model attribution and trustworthy content certification.

Technology Category

Application Category

📝 Abstract

The ubiquity of closed-weight language models with public-facing APIs has generated interest in forensic methods, both for extracting hidden model details (e.g., parameters) and for identifying models by their outputs. One successful approach to these goals has been to exploit the geometric constraints imposed by the language model architecture and parameters. In this work, we show that a lesser-known geometric constraint--namely, that language model outputs lie on the surface of a high-dimensional ellipse--functions as a signature for the model and can be used to identify the source model of a given output. This ellipse signature has unique properties that distinguish it from existing model-output association methods like language model fingerprints. In particular, the signature is hard to forge: without direct access to model parameters, it is practically infeasible to produce log-probabilities (logprobs) on the ellipse. Secondly, the signature is naturally occurring, since all language models have these elliptical constraints. Thirdly, the signature is self-contained, in that it is detectable without access to the model inputs or the full weights. Finally, the signature is compact and redundant, as it is independently detectable in each logprob output from the model. We evaluate a novel technique for extracting the ellipse from small models and discuss the practical hurdles that make it infeasible for production-scale models. Finally, we use ellipse signatures to propose a protocol for language model output verification, analogous to cryptographic symmetric-key message authentication systems.

Problem

Research questions and friction points this paper is trying to address.

Identifying source models from language model outputs using geometric constraints

Developing forgery-resistant signatures based on elliptical output distributions

Creating verification protocols for authenticating language model outputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Exploits ellipse geometry in model outputs as signatures

Uses compact self-contained signatures for model identification

Proposes verification protocol similar to cryptographic authentication

🔎 Similar Papers

Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique