Reading Between the Lines: Towards Reliable Black-box LLM Fingerprinting via Zeroth-order Gradient Estimation

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Existing black-box fingerprinting methods for large language models (LLMs) suffer from insufficient fingerprint uniqueness, as they rely solely on model outputs—whose nonlinear transformations discard parameter-sensitive information. Method: We propose ZeroPrint, a novel black-box fingerprinting technique based on zeroth-order gradient estimation. ZeroPrint constructs semantically preserving input perturbations via synonym substitution, estimates input gradients, and builds the Jacobian matrix as the fingerprint. Contribution/Results: We theoretically establish, for the first time, that input gradients are more discriminative than output logits by leveraging Fisher information theory; ZeroPrint effectively approximates this information-rich feature under black-box access constraints. Extensive experiments on standard benchmarks demonstrate that ZeroPrint significantly outperforms state-of-the-art methods, achieving superior fingerprint uniqueness and robustness.

Technology Category

Application Category

📝 Abstract

The substantial investment required to develop Large Language Models (LLMs) makes them valuable intellectual property, raising significant concerns about copyright protection. LLM fingerprinting has emerged as a key technique to address this, which aims to verify a model's origin by extracting an intrinsic, unique signature (a "fingerprint") and comparing it to that of a source model to identify illicit copies. However, existing black-box fingerprinting methods often fail to generate distinctive LLM fingerprints. This ineffectiveness arises because black-box methods typically rely on model outputs, which lose critical information about the model's unique parameters due to the usage of non-linear functions. To address this, we first leverage Fisher Information Theory to formally demonstrate that the gradient of the model's input is a more informative feature for fingerprinting than the output. Based on this insight, we propose ZeroPrint, a novel method that approximates these information-rich gradients in a black-box setting using zeroth-order estimation. ZeroPrint overcomes the challenge of applying this to discrete text by simulating input perturbations via semantic-preserving word substitutions. This operation allows ZeroPrint to estimate the model's Jacobian matrix as a unique fingerprint. Experiments on the standard benchmark show ZeroPrint achieves a state-of-the-art effectiveness and robustness, significantly outperforming existing black-box methods.

Problem

Research questions and friction points this paper is trying to address.

Developing reliable black-box LLM fingerprinting for copyright protection

Addressing limitations of existing black-box fingerprinting methods

Estimating gradient information without model parameter access

Innovation

Methods, ideas, or system contributions that make the work stand out.

Estimates gradients via zeroth-order approximation

Uses semantic word substitutions for input perturbations

Constructs Jacobian matrix as unique model fingerprint

🔎 Similar Papers

A Watermark for Black-Box Language Models