🤖 AI Summary
This work addresses the scalability limitations of fingerprinting for large language model (LLM) copyright protection—specifically, high false-positive rates, fingerprint leakage risks, and vulnerability to user collusion attacks. We propose Perinucleus Sampling, a novel fingerprint embedding method that, for the first time, scales per-model embeddable fingerprint capacity to 24,576—two orders of magnitude higher than prior approaches—while ensuring strong persistence, zero inference-time performance degradation, and resilience against joint evasion. Leveraging large-scale fingerprint space design, theoretical robustness modeling, and post-supervised-fine-tuning (SFT) retention validation, we demonstrate on Llama-3.1-8B that: (i) model accuracy remains unchanged after embedding; (ii) detection rate stays at 100% even after standard SFT; and (iii) both false-positive rate and collusion attack success rate are significantly reduced. Our approach delivers a scalable, highly robust solution for precise provenance tracing and copyright attribution in API-based LLM sharing scenarios.
📝 Abstract
Model fingerprinting has emerged as a powerful tool for model owners to identify their shared model given API access. However, to lower false discovery rate, fight fingerprint leakage, and defend against coalitions of model users attempting to bypass detection, we argue that {em scalability} is critical, i.e., scaling up the number of fingerprints one can embed into a model. Hence, we pose scalability as a crucial requirement for fingerprinting schemes. We experiment with fingerprint design at a scale significantly larger than previously considered, and introduce a new method, dubbed Perinucleus sampling, to generate scalable, persistent, and harmless fingerprints. We demonstrate that this scheme can add 24,576 fingerprints to a Llama-3.1-8B model -- two orders of magnitude more than existing schemes -- without degrading the model's utility. Our inserted fingerprints persist even after supervised fine-tuning on standard post-training data. We further address security risks for fingerprinting, and theoretically and empirically show how a scalable fingerprinting scheme like ours can mitigate these risks.