Provable Model Provenance Set for Large Language Models

📅 2026-01-31

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses the lack of provenance methods with provable error control for unauthorized use and multi-source attribution of large language models. It formally defines the model provenance problem with statistical guarantees and introduces the Model Provenance Set (MPS) framework, which constructs a small set of candidate source models satisfying a user-specified confidence level through sequential hypothesis testing and an adaptive exclusion mechanism. The proposed method provides the first provably correct coverage guarantee for model provenance, effectively handles multi-source scenarios, and achieves the target provenance coverage while strictly controlling the inclusion of irrelevant models. This approach is well-suited for model attribution and auditing tasks requiring rigorous statistical assurance.

Technology Category

Application Category

📝 Abstract

The growing prevalence of unauthorized model usage and misattribution has increased the need for reliable model provenance analysis. However, existing methods largely rely on heuristic fingerprint-matching rules that lack provable error control and often overlook the existence of multiple sources, leaving the reliability of their provenance claims unverified. In this work, we first formalize the model provenance problem with provable guarantees, requiring rigorous coverage of all true provenances at a prescribed confidence level. Then, we propose the Model Provenance Set (MPS), which employs a sequential test-and-exclusion procedure to adaptively construct a small set satisfying the guarantee. The key idea of MPS is to test the significance of provenance existence within a candidate pool, thereby establishing a provable asymptotic guarantee at a user-specific confidence level. Extensive experiments demonstrate that MPS effectively achieves target provenance coverage while strictly limiting the inclusion of unrelated models, and further reveal its potential for practical provenance analysis in attribution and auditing tasks.

Problem

Research questions and friction points this paper is trying to address.

model provenance

large language models

provable guarantees

attribution

unauthorized model usage

Innovation

Methods, ideas, or system contributions that make the work stand out.

provable guarantees

model provenance

sequential testing