🤖 AI Summary
This work addresses the challenge of output attribution and forgery detection for large language models (LLMs). We propose a parameter-free, geometric authentication method based on intrinsic properties of output logits. We observe that logits from distinct LLMs naturally reside on characteristic ellipsoidal surfaces in high-dimensional space—a model-specific, self-contained, compact, and forgery-resistant “ellipsoidal signature.” Crucially, this signature is extractable solely from the output logits, requiring neither input text nor model weights, enabling both model provenance identification and output authenticity verification. Experiments across diverse small-scale models confirm the uniqueness and detectability of these signatures. Leveraging this property, we design a symmetric-key–like output authentication protocol. To our knowledge, this is the first work to transform geometric constraints inherent in LLM outputs into verifiable digital signatures, establishing a novel paradigm for model attribution and trustworthy content certification.
📝 Abstract
The ubiquity of closed-weight language models with public-facing APIs has generated interest in forensic methods, both for extracting hidden model details (e.g., parameters) and for identifying models by their outputs. One successful approach to these goals has been to exploit the geometric constraints imposed by the language model architecture and parameters. In this work, we show that a lesser-known geometric constraint--namely, that language model outputs lie on the surface of a high-dimensional ellipse--functions as a signature for the model and can be used to identify the source model of a given output. This ellipse signature has unique properties that distinguish it from existing model-output association methods like language model fingerprints. In particular, the signature is hard to forge: without direct access to model parameters, it is practically infeasible to produce log-probabilities (logprobs) on the ellipse. Secondly, the signature is naturally occurring, since all language models have these elliptical constraints. Thirdly, the signature is self-contained, in that it is detectable without access to the model inputs or the full weights. Finally, the signature is compact and redundant, as it is independently detectable in each logprob output from the model. We evaluate a novel technique for extracting the ellipse from small models and discuss the practical hurdles that make it infeasible for production-scale models. Finally, we use ellipse signatures to propose a protocol for language model output verification, analogous to cryptographic symmetric-key message authentication systems.