🤖 AI Summary
This work addresses the lack of a scalable and secure multi-client collaborative copyright verification mechanism in federated learning. To this end, the authors propose a (t, K)-threshold watermarking method that, for the first time, integrates secret sharing with white-box model watermarking. In this framework, K clients collaboratively embed a shared watermark during training, and the watermark key can only be reconstructed—and zero-knowledge copyright verification performed—when at least t parties cooperate. The proposed mechanism offers strong resistance against unilateral tampering and exhibits high scalability, maintaining high detection significance (z ≥ 4) even at K = 128, with negligible impact on model accuracy. Furthermore, it effectively withstands adaptive fine-tuning attacks leveraging up to 20% of the data.
📝 Abstract
In federated learning (FL), $K$ clients jointly train a model without sharing raw data. Because each participant invests data and compute, clients need mechanisms to later prove the provenance of a jointly trained model. Model watermarking embeds a hidden signal in the weights, but naive approaches either do not scale with many clients as per-client watermarks dilute as $K$ grows, or give any individual client the ability to verify and potentially remove the watermark. We introduce $(t,K)$-threshold watermarking: clients collaboratively embed a shared watermark during training, while only coalitions of at least $t$ clients can reconstruct the watermark key and verify a suspect model. We secret-share the watermark key $\tau$ so that coalitions of fewer than $t$ clients cannot reconstruct it, and verification can be performed without revealing $\tau$ in the clear. We instantiate our protocol in the white-box setting and evaluate on image classification. Our watermark remains detectable at scale ($K=128$) with minimal accuracy loss and stays above the detection threshold ($z\ge 4$) under attacks including adaptive fine-tuning using up to 20% of the training data.